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I  INTRODUCTION 


This  report  describes  a  one-year  research  program  to  evaluate  the  ef¬ 
fectiveness  of  automatic  signature  verification  based  on  three-axis  signature 
dynamics . 

There  were  two  major  aspects  of  the  research  effort: 

e  Data  collection. 

•  Performance  analysis  to  estimate  the  access  time  and  Type  I/Type  II 
error  curves  for  the  signature  verification  system. 

Over  a  four-month  period,  5,220  signatures  and  1,740  numeric  sequences 
were  collected  from  59  subjects.  These  data  were  collected  both  with  the 
subjects  sitting  at  a  table  and  standing  at  a  counter.  Twelve  trained  forgers 
attempted  648  forgeries.  The  forgers  were  given  copies  of  the  true  signers' 
signatures,  instructed  in  how  the  signature  verification  system  works  and  what 
it  measures,  allowed  to  watch  video  tapes  of  the  true  signers  writing  their-, 
signatures,  and  allowed  to  practice  as  much  as  they  desired  over  a  three-week 
period.  These  data  and  the  data  collection  protocol  are  described  in  detail 
in  Section  II. 

Signature  verification  algorithms  and  associated  data  base  analysis  tech¬ 
niques  are  discussed  in  Section  III.  The  primary  focus  is  on  the  features  and 
"rubbery"  correlation  algorithms  for  signature  verification,  and  on  a  dis¬ 
criminant  analysis  approach  to  subject  identification  based  on  a  handwritten 
sequence  of  numerals. 

A  detailed  summary  of  the  performance  analysis  results  is  given  in 
Section  IV.  Estimates  for  the  average  access  time  and  Type  I/Type  II  error 
curves  are  presented  for  the  features  and  rubbery  correlation  signature- 
verification  techniques  for  a  variety  of  operating  conditions.  The  results  of 
the  subject  identification  trials  based  on  a  handwritten  numeral  sequence  and 
a  discussion  of  the  user  acceptability  of  the  system  are  also  given  in 
Section  IV. 

A  brief  summary  of  the  major  results  of  the  study  and  recommendations  for 
future  work  are  given  in  Section  V. 

Because  of  the  proprietary  nature  of  some  of  the  software  programs  used 
in  the  research,  copies  of  these  programs  and  associated  documentation  will  be 
delivered  to  RADC  under  separate  cover.  A  magnetic  tape  containing  all  the 
data  collected  in  a  format  compatible  with  RADC's  PDP  11/70  operating  under 
RSX  11-M  will  sltullarly  be  provided.  Documentation  for  the  magnetic  tape, 
including  a  test  program  for  reading  data  from  the  tape,  is  given  in 
Appendix  A  of  thin  report. 
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DESCRIPTION  OF  THE  DATA  BASE  AND 
DATA  COLLECTION  PROCEDURES 


Summary  of  Data  Collected 


The  collection  of  true-signer  data  took  place  over  a  four-month  period 
from  the  beginning  of  June  to  the  end  of  September  1980.  A  list  of  the  59 
subjects  who  participated  in  the  data  base  and  the  number  of  signatures  for 
each  is  given  in  Table  1.  Subjects  are  identified  by  their  initials.  A  total 
of  5,220  true  signatures  was  collected.  These  data  will  be  delivered  to  RADC 
in  the  form  of  a  magnetic  tape  in  a  format  compatible  with  the  PDP  11/70  com¬ 
puter  (see  Appendix  A) . 


Table  1 

NUMBER  OF  SIGNATURES  AND  NUMERAL  STRINGS  COLLECTED  FROM 
EACH  SUBJECT  IN  THE  TRUE-SIGNER  DATA  BASE 


Table  1  (Concluded) 


Subject 

Number  of 
Signatures 

Number  of 
Numeral  Strings 

Total 

(Signatures  +  Numerals) 

GEW 

90 

30 

120 

HEP 

78 

26 

104 

HFS 

84 

28 

112 

JCZ 

66 

22 

88 

JEE 

90 

30 

120 

JEM 

96 

32 

128 

JEP 

102 

34 

136 

JJS 

114 

38 

152 

JLP 

102 

34 

136 

JNH 

84 

28 

112 

JRL 

90 

30 

120 

KCN 

84 

28 

112 

KES 

108 

36 

144 

LAL 

84 

28 

112 

LEL 

120 

40 

160 

MAB 

96 

32 

128 

MAN 

66 

22 

88 

MER 

42 

14 

56 

MFA 

84 

28 

112 

MRC 

108 

36 

144 

OEK 

102 

34 

136 

PER 

30 

10 

40 

PES 

66 

22 

88 

PJP 

120 

40 

160 

PLH 

78 

26 

104 

RAB 

102 

34 

136 

RTK 

54 

18 

72 

RWH 

102 

34 

136 

RWR 

90 

30 

120 

SAW 

108 

36 

144 

SDJ 

84 

28 

112 

SEA 

78 

26 

104 

SEC 

102 

34 

136 

SEM 

54 

18 

72 

SRW 

90 

30 

120 

TDK 

66 

22 

88 

TPP 

60 

20 

80 

TSS 

114 

38 

152 

VKR 

84 

28 

112 

Total 

5,220 

1,740 

6,960 
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The  data  base  subjects  were  chosen  at  random  from  a  large  group  of 
volunteers  at  SRI.  The  only  constraints  Imposed  on  subject  selection  were  to 
obtain  an  approximately  equal  number  of  women  and  men,  about  10  percent  left¬ 
handers  (based  upon  estimates  of  the  percentage  of  left-handers  in  the  general 
population),  and  a  range  of  heights,  weights,  and  ages.  Referring  to  Table  1, 
the  left-handers  in  the  data  base  are  CMS,  FET,  PER,  PES,  RWH,  and  SEC. 

Thirty  of  the  59  subjects  were  women. 

In  addition  to  the  signature  data,  1,740  handwritten  samples  of  the 
numeric  sequence  12345  were  obtained  from  the  59  subjects  during  the  same  data 
collection  period.  Although  not  specified  in  the  original  work  statement, 

SRI,  at  the  request  of  RA DC,  agreed  to  collect  this  numeral  data  for  the 
purpose  of  determining  how  well  the  59  subjects  could  be  identified*  from 
handwritten  samples  of  the  same  set  of  characters.  The  numberic  data  col¬ 
lected  are  also  summarized  in  Table  1.  The  total  number  of  responses  obtained 
(signatures  and  numerals)  was  6,960. 

In  addition  to  the  signature  and  numeric  data,  648  forgery  attempts  were 
obtained  from  12  trained  forgers.  A  summary  of  the  attempted  forgery  data  is 
given  in  Table  2.  The  forgery  data  will  also  be  delivered  to  RADC  on  magnetic 
tape  (see  Appendix  A).  A  detailed  discussion  of  what  Information  was  made 
available  to  the  forgers  and  how  they  were  trained  is  given  in  the  next 
section  (II-B). 

The  total  amount  of  data  collected,  including  signatures,  numerals,  and 
attempted  forgeries,  is  on  the  order  of  25  million  bytes  (25  megabytes  or  200 
megabits) . 

Finally,  as  a  separate  item,  each  subject  in  the  true-signer  data  base 
was  videotaped  in  the  process  of  signing  three  signatures.  As  discussed  in 
II-B,  these  tapes  were  used  in  the  forger  training  to  provide  the  kind  of 
dynamic  information  that  can  be  obtained  by  observing  the  true  signer  write 
his  signature. 


"Verification"  and  "identification"  have  different  goals.  In  verification  a 
person  makes  a  claim  as  to  his  identity  and  the  system  attempts  to  verify 
this  claim  by  comparing  his  handwritten  signature  against  the  computer-stored 
reference  or  template  of  that  person's  known  signature.  In  identification 
the  person  does  not  make  an  a  priori  claim  as  to  his  identity;  rather,  the 
system  attempts  to  determine  his  identity  by  comparins  his  handwritten  sample 
against  the  set  of  templates  for  all  persons  in  the  data  base  to  find  the 
closest  match. 
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Table  2 


SUMMARY  OF  FORGERY  ATTEMPTS 


Forger 

True  Signer 

Number  of  Attempts 

AEP 

RWR 

18 

AEP 

VKR 

18 

BEH 

TPP 

18 

DEC 

ASI 

18 

DEC 

JNH 

18 

DEC 

SAW 

18 

DEC 

SDJ 

18 

GEM 

ELF 

18 

GEM 

JEE 

18 

JER 

AEW 

18 

JER 

GAM 

18 

JFL 

DRB 

18 

JFL 

MRC 

18 

JFL 

RAB 

18 

PED 

GEG 

18 

PEM 

EMW 

18 

PEM 

MAB 

18 

PEM 

MFA 

18 

PEM 

SEA 

18 

RWH 

CMS 

18 

RWH 

FET 

18 

RWH 

PES 

18 

RWH 

SEC 

18 

RWH 

JEM 

18 

RWH 

BJG 

18 

VEW 

FJM 

18 

VEW 

FLL 

18 

VEW 

JRL 

18 

VEW 

LEL 

18 

JSO 

AAF 

18 

JSO 

AEP 

18 

JSO 

CEP 

18 

JSO 

DEP 

18 

JSO 

LAL 

18 

TPP 

GEW 

18 

TPP 

OEK 

18 

Total  648 
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B.  Data  Collection  Protocol 


1.  Data  Collection  Area 


The  data  were  collected  in  a  partially  enclosed  area  containing  a  table 
and  a  counter  (a  podium-like  stand).  For  the  reasons  discussed  below,  at  each 
data  collection  session  the  subject  wrote  signatures  both  while  sitting  down 
at  the  table  and  while  standing  at  the  counter.  The  operator  (a  research 
assistant)  sat  in  front  of  a  computer  terminal  immediately  adjacent  to  the 
partially  enclosed  area.  Although  the  area  was  partially  enclosed,  the  sub¬ 
jects  were  not  totally  isolated  from  view  nor  acoustically  shielded  from  the 
normal  computer  noise.  In  essence,  the  data  collection  environment  was  es¬ 
sentially  what  might  be  expected  for  a  personal  identification  system  used  for 
access  control  to  a  computer  area. 


2 .  True-Signer  Data  Base 

Upon  entering  the  data  collection  area,  the  subject  was  given  a  standard 
form  on  which  to  write  his  signatures  and  numerals  for  the  session.  This  form, 
shown  in  Figure  1,  was  filled  out  ahead  of  time  with  the  subject's  name,  the 
date,  and  other  pertinent  information  so  that  the  subject  was  free  to  concen¬ 
trate  on  signing  his  signature  and  writing  the  sequences  of  numerals.  The 
operator  told  the  subject  whether  the  data  collection  session  was  to  begin  at 
the  table  or  the  counter.  To  avoid  biases,  the  order  of  collection  alternated; 
that  is,  at  one  session  the  standing  signatures  would  be  collected  first  and 
the  next  time  the  sitting  signatures  would  be  first.  If  the  table  was  first, 
the  subject  wrote  three  signatures  and  one  set  of  numerals  (12345)  sitting  at 
the  table,  and  then  wrote  three  more  signatures  and  another  numeric  sequence 
while  standing  at  the  counter.  When  the  counter  was  first,  the  process  was 
reversed.  Thus  a  data  collection  session  consisted  of  six  signatures  and  two 
numeric  sequences.  Three  signatures  under  both  sitting  and  standing  condi¬ 
tions  were  required  for  each  data  collection  session,  because  in  the  perfor¬ 
mance  analysis  we  planned  to  simulate  a  personal  identification  system  that 
allowed  up  to  three  tries  at  verification. 

During  the  first  session,  the  subject  was  given  brief  instructions.  He 
was  told  that  the  system  measures  forces  and  dynamics  so  that  any  unusual 
pauses  in  writing  are  likely  to  cause  the  signatures  to  be  rejected.  The 
subject  was  instructed  to  use  his  or  her  standard  signature.  A  subject  who 
typically  used  one  or  more  signature  variants  (e.g.,  a  full  middle  name  one 
time  and  only  an  initial  the  next)  was  requested  to  use  the  most  common  version 
of  the  signature.  The  subject  was  instructed  to  inform  the  research  assistant 
of  any  obvious  mistakes  such  as  leaving  out  a  middle  name  or  initial,  or  other 
gross  signature  variants.  There  were  very  few  such  mistakes  and  those  that 
occurred  were  excluded  from  the  data  base. 

The  signature  and  numeral  data  for  each  session  were  collected  in  a  real¬ 
time  on-line  basis.  That  is,  whenever  a  subject  wrote  a  signature  it  was 
automatically  digitized  by  the  PDP  11/40  computer  and  written  out  on  a  large 
disk  (67  megabytes),  includlng'a  header  record  consisting  of  the  subject's 
initials,  the  date  and  time,  a  response  or  index  number,  and  various  other 
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Date _ 


FIGURE  1  FORM  FOR  DATA  COLLECTION 
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pertinent  information.  All  data  forms  for  all  subjects  and  for  all  data  col¬ 
lection  sessions,  as  well  as  hard-copy  records  of  all  program  transactions, 
were  saved.  In  all,  sufficient  records  were  maintained  so  that  whenever  ques¬ 
tions  arose  about  the  data  it  was  possible  to  reconstruct  exactly  what  happened 
during  the  session  in  question. 

For  a  signature  verification  system  operating  in  the  "real  world,"  the 
users  must  cooperate  with  the  system  or  risk  being  denied  access  to  a  secure 
area,  computer  account,  or  the  like.  However,  no  such  motivation  exists  for 
a  data  collection  effort  of  the  type  described  here.  Thus  there  is  always  the 
danger  that  subjects  will  grow  careless  after  the  initial  novelty  of  the  sys¬ 
tem  wears  off,  which  can  lead  to  unnaturally  large  variations  in  the  way 
signatures  are  written  and  cause  an  artificially  low  estimate  of  system 
performance  (compared  to  a  real-world  system  in  which  users  are  continually 
motivated  by  the  need  for  access).  Hence  to  better  simulate  real-world  oper¬ 
ating  conditions  we  offered  prizes  for  the  signatures  that  were  most  consistent 
over  the  data  collection  period.  The  intent  here  was  to  provide  at  least  some 
motivation  for  the  subjects  to  perform  as  they  would  in  a  real-world 
environment . 


3.  Forger  Data  Base 

The  basic  procedure  for  collecting  and  storing  attempted  forgery  data  was 
essentially  the  same  as  that  for  the  true-signer  data  base  described  in  the 
preceding  subsection.  This  is  to  be  expected,  because  in  the  real  world  there 
is  no  a  priori  knowledge  as  to  who  is  the  true  signer  and  who  is  the  forger, 
so  both  must  be  treated  the  same  (up  to  the  point  of  verification) .  For  con¬ 
sistency,  prizes  ($100,  $50,  and  $25)  were  also  offered  for  the  "best" 
forgeries  to  provide  motivation  for  the  forgers  to  practice  and  do  the  best 
job  possible. 

Since  the  forgery  data  collection  procedure  was  essentially  the  same  as 
that  for  the  true-signer  data  base  described  above,  it  remains  only  to  discuss 
the  training  and  preparation  of  the  forgers. 

One  of  the  first  problems  was  the  selection  of  forgers.  This  was  dif¬ 
ficult  because  the  SRI  signature  verification  system  is  based  on  the  dynamics 
of  a  signature  (i.e.,  the  forces  and  motions  used  to  create  a  signature)  rather 
than  its  final  static  image.*  Thus  the  requirements  for  being  a  successful 
forger  in  the  SRI  system  are  quite  different  than  those  for  a  "classical" 
forgery,  whose  purpose  is  to  duplicate  the  static  image  of  a  signature.  For 
example,  in  our  system  tracing  a  true  signature  would  be  one  of  the  worst 
strategies  for  forgery,  because  tracing  usually  results  in  dynamics  very  dif¬ 
ferent  from  those  of  the  true  signer  even  though  the  final  result  may  be 
essentially  identical.  Our  approach  was,  therefore,  to  select  motivated 
people  who  had  good  manual  dexterity  and  the  capability  of  understanding  the 
basic  concepts  behind  the  verification  system. 


See  Appendices  B  and  C  for  further  details. 
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Rather  than  requiring  the  forgers  to  make  a  few  attempts  at  all  the  dif¬ 
ferent  signatures  in  the  true-signer  data  base,  we  decided  that  a  more 
realistic  simulation  of  how  a  real  forger  would  operate  would  be  to  have  each 
of  our  forgers  concentrate  on  three  or  four  signatures.  They  were  given 
several  samples  of  these  signatures  and  were  also  given  a  description  of  how 
the  signature  verification  system  operates:  that  it  measures  signature 
dynamics,  that  timing  and  forces  are  generally  important,  and  that  some  of 
the  typical  features  on  which  the  verification  is  based  are  the  total  time  of 
the  signature,  average  force  in  the  three  orthogonal  directions  and  the  respec 
tive  energies,  the  number  of  pen-ups  and  pen-downs,  and  so  on.  Each  forger 
was  allowed  18  attempts  to  forge  a  particular  signature.  After  the  first  nine 
attempts  he  was  shown  a  video  tape  with  a  close-up  view  of  the  subject  signing 
his  signature.  This  was  intended  to  simulate  the  condition  in  which  a  real 
forger  surreptitiously  observes  a  person  writing  his  signature  to  learn  as 
much  as  possible  about  the  dynamics  of  the  signature.  Before  the  actual 
forgery  attempts,  the  forgers  were  allowed  to  practice  as  much  as  they  wanted 
within  a  three-week  period.  In  essence,  the  forgers  were  provided  with  all 
the  information  that  a  dedicated  real-world  forger  could  be  expected  to  obtain 


C.  Assessment  of  Data  Quality 

When  a  signature,  set  of  numerals,  or  any  other  response  is  written  using 
the  SRI  pen,  the  result  is  a  set  of  three  analog  signals  that  are  a  time 
record  of  the  instantaneous  three-axis  force*  on  the  pen  tip  during  writing. 

An  example  is  shown  in  Figure  2.  The  question  of  data  quality  then  has  two 
aspects : 

•  How  well  the  three  analog  time  series  signals  represent  the  important 
characteristics  of  a  handwritten  signature. 

•  How  accurate  the  digitized  (discrete)  representation  is  of  the  three 
analog  time  series  signals  that  are  generated  using  the  analog-to- 
digital  converter  and  PDP  11/40  computer. 

A  discussion  of  the  SRI  pen  as  a  device  for  transducing  the  motions  used 
in  handwriting  into  analog  electrical  signals  representing  the  motions  has 
already  been  published  and  hence  will  not  be  duplicated  here.  See  Appendix  B 
for  details. 

The  data  base  was  recorded  and  stored  in  digital  (discrete)  fora.  This 
approach  was  taken  because  it  was  more  compatible  with  the  subsequent  process¬ 
ing  and  analysis,  and  because  discrete  data  can  be  transported  relatively 
simply  between  computers  (e.g.,  in  transferring  the  data  base  from  SRI's  PDP 
11/40  to  RADC's  PDP  11/70).  However,  since  we  stored  only  the  discrete 
representations  of  the  P,X,Y  analog  signals,  it  was  important  to  ensure 


I.e.,  force  on  the  pen  tip  in  three  orthogonal  directions.  When  the  pen  tip 
is  vertical,  the  P-signal  represents  the  downward  force  or  pressure,  and  the 
X  and  Y  force  signals  represent  the  left/right  and  far /near  forces,  respec¬ 
tively,  in  the  plane  of  the  writing  surface. 


12 


tP  — 


(a)  ORIGINAL  SIGNATURE 


Y  SIGNAL 


X  SIGNAL 


P  SIGNAL 


1  2  3  4  5 

TIME  —  sacondi 

<bl  THE  THREE-OIMENSIONAL  SIGNALS  GENERATED  BY  THE  SRI  PEN 
DURING  THE  WRITING  OF  THE  ABOVE  SIGNATURE 

FIGURE  2  P.  X,  AND  Y  FORCE  SIGNALS  FOR  A  TYPICAL  SIGNATURE 

the  accuracy  of  the  discrete  representations.  The  frequency  response  of  the 
pen  signals  rolls  off  sharply  above  25  Hz,  because  of  filtering  in  the  elec¬ 
tronics,  and  it  is  reasonable  to  approximate  the  pen  response  as  being 
frequency-bandlimited  with  a  maximum  frequency  of  about  25  Hz.  The  sampling 
theorem  of  communication  theory  states  that  for  a  bandllmited  signal,  sampling 
at  least  twice  during  the  period  of  its  highest  frequency  component  is  suf¬ 
ficient  to  completely  characterize  the  signal  in  the  sense  that  the  original 
analog  signal  can  be  exactly  reconstructed  from  the  discrete  samples.  The 
minimum  sampling  rate  for  which  this  is  possible  is  called  the  Nyquist  rate, 
which  corresponds  to  sampling  exactly  twice  during  each  period  of  the  highest 
frequency  component.  For  the  pen  system  the  Nyquist  rate  «  1/2(25  Hz)  *  0.02  s 
or  50  samples/s  for  each  of  the  three  analog  signals.  However,  for  safety  we 
sampled  at  twice  this  rate,  or  100  samples/s  for  each  signal,  for  a  total  of 
300  samples /s.  This  ensures  that  no  loss  of  information  occurs  in  the  process 
of  digitization  and  storage  of  the  data  in  digital  form. 
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During  Che  data  collection  period,  quality  checks  on  the  data  were  made 
at  regular  intervals.  These  checks  consisted  of  printing  out  the  data  for 
selected  signatures  and  displaying  the  P,X,Y  signals  in  graphic  form  on  a 
Tektronix  display  scope.  As  discussed  earlier,  a  hard-copy  record  of  all  the 
data  collection  sessions  was  also  maintained. 

During  the  data  collection  period,  only  one  software  problem  occurred. 
Under  very  unusual  circumstances  a  very  narrow  spike  was  artificially  intro¬ 
duced  into  the  data.  This  problem,  which  was  traced  to  a  software  error  in 
the  data  collection  program,  was  corrected  at  an  early  stage  and  affected  only 
a  very  small  amount  of  data.  A  computer  program  was  written  to  search  through 
all  the  data  records  to  identify  which  were  affected  by  this  error.  These 
responses  were  excluded  from  the  analysis.  There  were  occasions,  also  very 
infrequent,  when  the  pen  ran  out  of  ink  and  had  to  be  refilled,  and  on  some 
such  occasions  a  bad  signature  resulted.  The  number  of  bad  responses  from 
all  sources  is  summarized  in  Table  3  for  signatures  and  numerals. 

The  forgery  data  base  collection  began  well  after  the  true-signer  data 
base,  by  which  time  these  problems  had  been  resolved,  and  hence  all  the 
forgery  data  was  of  good  quality. 

To  summarize,  of  the  total  of  7,608  true  signatures,  numerals,  and 
attempted  forgeries,  47,  or  0.6  percent,  were  deleted.  The  rest  of  the  data 
was  of  high  quality  and  was  used  in  the  subsequent  analysis. 


Table  3 


Ill  DATA  BASE  ANALYSIS  PROCEDURES 
The  purpose  of  the  data  base  analysis  is  twofold: 

•  To  optimize  the  performance  of  the  signature  verification  Systran. 

•  To  provide  estimates  of  the  performance  of  the  optimized  systems, 
including  Type  I/Type  II  error  curves*  and  access  time. 

In  this  section  we  summarize  the  basic  analysis  procedures  applied  to  the  data 
base  described  in  Section  II.  The  results  of  the  performance  evaluation  are 
reported  in  the  next  section  (Section  IV). 


A.  Features  Analysis  (for  Signature  and  Forgery  Data) 

In  this  subsection  we  discuss  the  analysis  procedures  applicable  to  the 
features  approach  to  signature  verification.  For  background  we  begin  with  a 
description  of  how  that  approach  actually  works. 


1.  Features  Approach  to  Signature  Verification 

The  features  approach  to  signature  verification  is  summarized  in  Fig¬ 
ure  3.  When  a  person's  identity  is  to  be  verified  (e.g.,  to  gain  access  to 
a  secure  area)  the  procedure  is  to  identify  himself  to  the  system  and  write 
his  signature.  As  shown  in  the  figure,  the  pen  transduces  the  forces  and 
motions  used  in  writing  the  signature  into  a  set  of  three  analog  signals  that 
are  a  time  record  of  the  instantaneous  force  on  the  tip  of  the  pen  in  the 
three  orthogonal  directions.  The  P-signal  is  the  downward  force  or  pressure, 
and  the  X  and  Y  signals  are  the  left/right  and  far/near  forces,  respectively, 
in  the  plane  of  the  paper.  These  analog  signals  are  input  to  an  analog-to- 
digital  converter  and  digitized  at  the  rate  of  100  samples  per  second  per 
channel.  The  digitized  representations  of  the  P,X,Y  analog  signals  are  then 
processed  by  a  computer  to  extract  a  set  of  descriptors,  called  features,  of 
the  three  signals.  These  features  include  various  timing  parameters  such  as 
the  total  time  of  the  signature,  the  average  force  in  each  of  the  three 
directions  (P,  X,  and  Y)  and  the  corresponding  energies,  the  number  of  pen- 
ups  and  pen-downs,  and  so  on.  A  complete  listing  of  the  features  considered 
is  given  in  III-A-2.  The  set  of  features  (sj_,  S2»  •••  sn)  extracted  from  the 
discrete  representations  of  the  P,  X,  and  Y  signals  form  the  feature  vector 
when  arranged  in  column  order,  as  shown  in  Figure  3.  The  computer  then  calls 
up  the  computer-stored  template  or  reference  feature  vector  corresponding  to 
the  person  whom  the  writer  claims  to  be.  The  template  vector  is  an  average 


*A  Type  I  error  occurs  when  a  true  signature  is  classified  as  a  forgery  (a 
false  rejection).  A  Type  II  error  occurs  when  a  forgery  is  classified  as  a 
true  signature  (impostor  accepted). 
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feature  vector  constructed  from  a  set  of  known  true  signatures.  Associated 
with  the  template  vector  is  a  vector  of  standard  deviations  for  the  features 
(see  Figure  3) ,  which  provides  a  measure  of  how  variable  the  true  signer  is 
from  signature  to  signature  for  each  feature.  A  measure  of  closeness  between 
the  feature  vector  and  the  template  vector  is  then  computed.  If  the  feature 
vector  corresponding  to  the  signature  in  question  is  "close  enough"  to  the 
template  vector,  the  signature  is  judged  to  be  a  true  signature  and  the  per¬ 
son's  identity  is  verified.  If  the  feature  vector  is  not  close  enough,  the 
signature  in  question  is  judged  an  attempted  forgery.  In  a  practical  signa¬ 
ture  verification  system,  the  writer  will  often  be  allowed  more  than  one 
chance  to  be  verified;  that  is,  if  the  first  signature  does  not  pass  the  above 
tests,  he  or  she  will  be  allowed  to  write  one  or  two  more  signatures  to  be 
tested  for  verification. 

A  quantitative  description  of  the  computed  "measure  of  closeness"  between 
the  feature  vector  and  the  template  vector  is  given  in  Appendix  C.  In  essence, 
the  measure  of  closeness  is  a  Euclidean  distance  metric,  normalized  or 
weighted  by  the  template  standard  deviation  vector.  When  the  calculated  dis¬ 
tance  metric  is  less  than  or  equal  to  a  pre-specif ied  threshold,  the  signature 
is  judged  to  be  true;  if  above  the  threshold,  it  is  judged  to  be  an  attempted 
forgery. 

In  the  features  technique,  as  we  have  seen,  the  forces  and  motions  in¬ 
volved  in  creating  a  signature  are  finally  represented  as  a  feature  vector. 
Clearly,  if  the  features  approach  to  signature  verification  is  to  be  success¬ 
ful,  the  feature  vector  must  contain  as  much  information  as  possible  that  is 
useful  for  discriminating  between  true  signatures  and  forgeries.  The  basic 
purpose  for  collecting  a  data  base  of  true  signatures  and  attempted  forgeries 
is  to  provide  data  that  can  be  analyzed  to  select  a  set  of  features  (which 
constitute  the  feature  vector)  that  provide  maximum  discriminating  power 
between  the  true  signatures  and  the  attempted  forgeries.  The  process  of 
optimizing  the  features  technique,  then,  consists  of  selecting  the  "best"  set 
of  features  and  an  appropriate  threshold  for  the  distance  metric  measure  of 
closeness  (see  Appendix  C  for  details) . 

The  problem  of  selecting  a  "best"  set  of  features  has  two  aspects,  which 
we  call  feature  extraction  and  feature  selection.  In  general,  there  is  no  way 
to  make  an  a  priori  determination  of  what  the  best  features  will  be  for  a 
particular  situation,  so  what  must  be  done  is  to  extract  a  relatively  large 
number  of  features  that  are  expected  to  be  useful  for  discriminating  between 
true  signatures  and  attempted  forgeries.  This  generally  results  in  a  great 
deal  of  redundancy.  The  objective  of  feature  selection  is  to  obtain  a 
reduced  set  of  features  that  contains  essentially  all  the  discriminating 
power  of  the  original  features  set.*  The  features  initially  extracted  for 


*Under  certain  assumptions  concerning  the  probability  distributions  of  the 
feature  set,  it  can  be  shown  that  the  process  of  feature  selection  cannot 
reduce  the  Type  I/Type  II  error  rates.  However,  as  a  practical  matter,  an 
improvement  in  error-rate  performance  often  results  from  feature  selection. 
See  R.  0.  Duda  and  P.  E.  Hart,  Pattern  Classification  and  Scene  Analysis 
(New  Yofk:  Wiley,  1973),  p/66. 
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Che  slgnacure  verification  problem  are  described  in  III-A-2.  The  feature 
selection  process  is  discussed  further  in  III-A-3. 

2.  Feature  Extraction 

Based  on  our  knowledge  of  the  characteristics  of  the  P,  X,  and  Y  signals 
derived  from  the  pen  and  our  experience  with  previous  true-signature  and 
forgery  data  bases,  a  set  of  44  features  was  selected  as  the  starting  point 
in  the  current  data  base  analysis.  These  features  are  described  in  Figure  4. 
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3.  Feature  Selection 


The  objective  of  feature  selection  is  the  following:  Given  the  rela¬ 
tively  large  set  of  44  features  described  in  the  preceding  subsection,  find  a 
subset  that  yields  the  best  performance  for  signature  verification.  The  pro¬ 
cess  of  feature  selection  yields  two  positive  benefits: 

•  It  reduces  the  Type  I/Type  II  error  rate. 

•  It  improves  computational  efficiency  by  excluding  or  combining  fea¬ 
tures  that  contain  redundant  information  about  the  signature. 

Before  presenting  the  results  of  our  analysis,  we  will  give  a  brief  descrip¬ 
tion  of  general  feature  selection  concepts. 

In  general,  all  feature  selection  techniques  follow  the  same  procedure. 
The  starting  point  is  a  large  set  of  features  that  the  analyst  believes  to  be 
useful  for  discriminating  between  samples  (in  our  case,  between  true  signa¬ 
tures  and  attempted  forgeries).  The  discriminating  power  of  each  of  the  fea¬ 
tures,  or  subsets  of  features,  is  determined  by  performing  statistical  tests 
on  a  training  set  of  data  that  is  believed  to  adequately  represent  the  popu¬ 
lation  of  interest.  The  subset  of  features  that  yields  the  best  performance 
(by  some  criteria)  and  that  contains  the  minimum  number  of  features  is  the 
"best"  feature  set.  Many  procedures  and  algorithms  for  performing  computer¬ 
ized  feature  selection  have  been  devised.  Some  of  these  are  based  on  uni¬ 
variate  F-ratio  evaluations,*  Fisher's  discriminate  analysis, +  information 
measures  such  as  divergence,^  and  a  host  of  ad  hoc  procedures.  For  the  cur¬ 
rent  project  we  tried  a  number  of  these  techniques.  Although  some  of  them 
performed  reasonably  well,  we  were  not  entirely  satisfied  with  the  results. 
The  standard  feature  selection  techniques  are  all  based  on  a  number  of 
assumptions  about  the  underlying  probability  structure  of  the  feature  set. 

The  exact  assumptions  differ  somewhat  from  technique  to  technique,  but  in 
general  it  is  assumed  that  the  set  of  features  is  distributed  as  a  multivari¬ 
ate  Gaussian  density,  that  the  covariance  matrices  (see  Appendix  C  for  a 
definition  of  the  covariance  matrix)  are  equal,  and  the  like.  Our  signature 
verification  features  do  not  appear  to  satisfy  these  conditions,  and  the  re¬ 
sult  is  that  the  feature  selection  techniques  mentioned  above  do  not  operate 
in  an  optimum  fashion;  that  is,  there  is  no  guarantee  that  the  feature  set 
obtained  is  the  one  that  minimizes  the  Type  I/Type  II  error  rate.  Because  of 
the  somewhat  unsatisfactory  performance  of  these  classical  feature  selection 


*The  F-ratio  technique  for  feature  selection  is  described  in  many  textbooks. 
For  example,  see  W.  J.  Dixon  and  F.  J.  Massey,  Introduction  to  Statistical 
Analysis,  3rd  ed.  (New  York:  McGraw-Hill,  1969),  ch.  10;  G.  W.  Snedecor  and 
W.  G.  Cochran,  Statistical  Methods.  6th  ed.  (Iowa  State  University  Press, 
1967),  ch.  14;  and  D.  E.  Bailey,  Probability  and  Statistics  (New  York: 

Wiley,  1971),  ch.  17  to  19. 

^R.  0.  Duda  and  P.  E.  Hart,  Pattern  Classification  and  Scene  Analysis  (New 
York:  Wiley,  1943). 

+S.  Kullback,  Information  Theory  and  Stecistics  (New  York:  Wiley,  1959). 
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techniques,  a  new  approach  was  devised  that  uses  as  its  basic  criterion  the 
direct  minimization  of  the  Type  I/Type  II  error  rate.*  This  effort,  which 
resulted  in  a  much  improved  feature  selection,  can  be  summarized  as  follows: 

We  began  by  extracting  a  subset  of  the  total  data  base  of  signatures.  Because 
we  had  an  equal  number  of  standing  and  sitting  signatures,  we  typically  used 
the  standing  signatures  to  select  features  (training  the  system)  and  the 
sitting  signatures  on  which  to  make  a  final  estimate  of  the  error  performance. 
To  cross-validate  the  results  the  procedure  was  reversed,  so  that  the  sitting 
data  were  used  as  the  training  set  and  the  standing  data  as  the  testing  set. 
This  process  of  using  different  sets  of  data  to  train  and  test  the  system 
provides  a  more  realistic  (conservative)  estimate  of  the  true  error  rates. 
(Using  the  same  data  for  training  and  testing  would  yield  unjustifiably  opti¬ 
mistic  results.)  Starting  with  the  full  set  of  44  features  and  (for  example) 
the  standing  signature  data  as  the  training  set,  we  first  calculated  the 
Type  I/Type  II  error-rate  curves^  for  all  subsets  of  43  features.  We  then 
examined  the  results  to  determine  which  of  the  43-feature  subsets  yielded  the 
best  Type  I/Type  II  error  performance.  Next  we  calculated  the  Type  I/Type  II 
error  curves  for  all  42-feature  subsets  of  the  best  43-feature  set,  then  for 
all  41-feature  subsets  of  the  best  set  of  42  features,  and  so  on.^  What  typ¬ 
ically  occurs  in  this  process  is  illustrated  in  Figure  5.  As  useless  and/or 
redundant  features  are  removed,  the  error  rate  decreases  until  it  reaches  a 
minimum.  Once  this  minimum  is  reached,  excluding  more  features  results  in 
reduced  performance.  The  feature  set  that  yields  the  minimum  is  selected  as 
the  best  set. 

The  above  procedure  is  an  approximation  to  the  more  complete  process  of 
calculating  the  Type  I/Type  II  error  rates  for  all  possible  subsets  of  the  44 
features,  which  is  computationally  prohibitive.®  Compared  to  the  classical 
techniques  for  feature  selection,  this  method  has  the  following  advantages: 

e  It  requires  no  assumptions  about  the  underlying  probability  distri¬ 
bution  of  the  feature  set. 

•  The  calculations  involved  are  relatively  simple  and  intuitively 
reasonable. 

•  It  selects  a  "best"  feature  set  by  choosing  the  subset  that  yields 
the  least  probability  of  error  (subject  to  the  qualification  mentioned 
above  that  not  all  possible  combinations  of  feature  subsets  are  tested 


*Generally,  the  classical  feature  selection  techniques  cannot  be  related 
directly  to  Type  I/Type  II  error  rates  except,  as  noted  earlier,  under  a  set 
of  restrictive  assumptions  about  the  probability  structure  of  the  feature 
set  (which  are  not  satisfied  by  the  signature  verification  features). 

"'‘The  procedure  for  calculating  these  curves  is  described  in  IV-A. 

^This  leave-one-out  strategy  can  be  rather  time-consuming  in  itself.  We 
were  able  to  make  the  process  more  efficient  by  excluding  more  than  one 
feature  per  iteration. 

®A  set  of  only  20  features  would  require  more  than  one  million  Type  I/Type  II 
calculations . 
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•AT  INTERSECTION  OF  THE  TYPE  f/TYPE  II  CURVES 
FIGURE  5  TYPE  I/TYPE  II  EQUAL-ERROR  RATE*  VERSUS  NUMBER  OF  FEATURES 


by  our  restricted  search  algorithm) .  This  is  not  true  of  classical 
feature  selection  procedures  in  general,  whose  results  can  be  said  to 
minimize  the  probability  of  error  only  under  a  very  restrictive  set 
of  assumptions,  which  experience  has  shown  is  not  valid  for  the  signa¬ 
ture  verification  features. 

•  It  takes  into  account  correlations  between  features  (implicitly). 
Redundant  features  (i.e.,  two  features  that  are  highly  correlated) 
are  excluded  by  the  process  of  choosing  the  minimum  point  on  the  curve 
in  Figure  5. 

4.  Type  I/Type  II  Error-Curve  Calculation  Procedures 

To  calculate  the  Type  I/Type  II  error  curves,  we  developed  an  analysis 
procedure  that  simulates  how  a  real-world  signature  verification  system  might 
operate.  This  program  includes  an  enrollment  phase  in  which  templates  are 
constructed  from  the  first  few  (typically  10  or  12)  signatures  for  each  sub¬ 
ject,  and  a  verification  phase  in  which  subsequent  true  signatures  and 
attempted  forgeries  are  compared  against  the  appropriate  templates  to  deter¬ 
mine  the  percentage  of  false  rejections  of  true  signers  and  of  imposter 


23 


acceptances.  The  system  allows  up  to  three  tries  (signatures  to  be  compared 
against  the  template)  per  verification  trial.  If  the  first  signature  fails  to 
pass  the  verification  criteria,  a  second  signature  is  tested.  If  the  second 
also  fails,  a  third  signature  is  considered.  If  all  three  signatures  for  a 
particular  verification  trial  fail  to  pass  the  verification  criteria,  the  sub¬ 
ject  is  rejected  as  an  impostor. 

A  template  updating  procedure  was  used  to  continually  modify  templates 
based  on  successful  verification  attempts.  Each  time  a  verification  trial  was 
successful  on  the  first  try  (i.e.,  the  first  signature  satisfied  the  verifica¬ 
tion  criteria),  the  feature  vector  for  that  signature  was  added  to  the  tem¬ 
plate  vector  with  a  weighting  of  1/8.  Thus  if  a  subject's  signature  varied 
over  time,  the  template  would  track  the  change.  This  template  updating  pro¬ 
cedure  was  found  to  improve  verification  performance  by  reducing  the  percent¬ 
age  of  true-signer  rejections. 

The  basic  criteria  used  to  judge  whether  a  particular  test  signature  was 
a  true  signature  or  an  attempted  forgery  was  as  follows:  As  in  Figure  3,  let 
s  be  the  feature  vector  representing  the  test  signature.  The  components  of  s 
are  the  values  of  the  set  of  "best"  features  determined  by  the  method  de¬ 
scribed  in  the  preceding  subsection.  Let  t  be  the  computer-stored  template 
or  reference  vector  and  a  the  associated  standard  deviation  vector.  The  deter¬ 
minations  of  t  and  a  are  based  on  an  enrollment  set  of^known^true  signatures 
(see  Appendix  C  for  explicit  formulae  for  calculating  t  and  o) .  Referring  to 
Figure  3,  the  measure  of  closeness  between  the  test  signature  and  the  template 
is  the  weighted  Euclidean  distance  metric 


d(s,t)  = 


where  f  is  the  number  of  features,  s^  is  the  value  of  the  i_th  component  or 
feature  in  the  feature  vector  s,  t^  is  the  ^th  component  of  the  template 
vector,  and  is  the  standard  deviation  of  the  jith  feature  as  computed  from 
a  set  of  enrollment  signatures.  (See  Appendix  C  for  the  reasons  for  selecting 
this  Euclidean  distance  metric  as  the  measure  of  closeness  between  the  test 
signature  and  the  template.) 

The  quantity  d(s,t)  is  a  measure  of  closeness  between  the  vectors  s  and 
1:.  The  smaller  the  calculated  value  of  d(s,t),  the  greater  the  similarity 
between  s  and  ?,  and  therefore  between  the  test  signature  represented  by  s  and 
the  computer-stored  template  t  for  the  subject  whose  identity  is  being  claimed 
by  the  person  desiring  to  be  verified. 

The  decision  rule  for  deciding  whether  a  particular  test  signature  satis¬ 
fies  the  verification  criteria  is: 

•  If  d(s,i:)  <  dt^ires,  the  signature  is  judged  to  be  a  true  signature. 

•  If  d(s,t)  >  dt*ires,  the  signature  is  judged  to  be  an  attempted  forgery. 


The  quantity  dt^res  is  a  pre-assigned  threshold  value  selected  by  using  the 
Type  1/Type  II  error  curves  to  obtain  the  optimum  trade-off  between  Type  I  and 
Type  II  errors  for  the  particular  application  of  interest.  For  example,  for 
high-security  applications,  dtlires  would  likely  be  set  to  a  relatively  small 
value  to  minimize  the  impostor  acceptance  rate,  while  for  banking  applications 
in  which  the  concern  is  usually  to  minimize  user  inconvenience  (i.e.,  minimize 
the  Type  I  error  rate)  a  larger  value  for  dc^res  might  be  more  suitable. 

The  procedure  by  which  the  Type  I/Type  II  error  curves  were  estimated 
from  the  data  base  is  as  follows:  Let  Tc  represent  the  total  number  of  veri¬ 
fication  trials  in  the  true-signer  data  base,  and  let  represent  the  number 

of  trials  for  which  a  true  signer  was  falsely  rejected.  Note  that  the  number 
of  false  rejections  Rc  is  a  function  of  the  decision  threshold  while  Tt  is  not. 
In  general,  Rfc  decreases  as  d*"*11^8  increases  and  increases  as  d1-*1168  decreases. 
Recall  that  each  verification  trial  allows  up  to  three  attempts,  so  that  a 
false  rejection  occurs  only  when  all  three  signatures  fail  to  satisfy  the 
verification  criteria.  The  Type  I  error  (false  rejection  rate  for  true 
signers)  is  estimated  as 


Type  I  error  = 


R 


t 


T 


t 


The  *  symbol  is  used  to  indicate  that  Ej  is  an  estimate  of  the  error  rate. 

When  Ej  is  plotted  as  a  function  of  the  decision  threshold,  dthres,  the  Type  I 
error  curve  results.  Similarly,  the  Type  II  error  is  estimated  to  be 


Type  II  error  =  E  =  — 

T 


where  T^  is  the  total  number  of  forger  trials  and  R^  is  the  number  of  trials 
for  which  a  forged  signature  passes  the  verification  criteria  (i.e.,  the  num¬ 
ber  of  imposter  acceptances) .  R^  is  also  a  function  of  the  decision  thresh¬ 
old;  it  increases  with  increasing  dc^res  and  decreases  with  decreasing  d*-*11"68. 
A  plot  of  Ejj  versus  dt^ires  yields  the  Type  II  error  curve.  The  justification 
for  using  the  particular  form  of  error-rate  estimation  given  above  is  dis¬ 
cussed  in  Appendix  D.  Ej  and  Eu  are  the  maximum  likelihood  estimates  (assum¬ 
ing  independent  trials)  of  the  error  rate  for  binomial  distributed  random 
variables. 

Ej  and  Ejj  are  estimates  of  the  Type  I  and  Type  II  errors,  respectively, 
based  on  our  data  base  of  true  signatures  and  attempted  forgeries.  The  ques¬ 
tion  then  arises  as  to  how  confident  we  are  that  these  estimates  really  cor¬ 
respond  to  the  actual  population  error  rates.  In  other  words,  our  data  base 
is  only  a  sample  drawn  from  a  larger  population  of  true  signers  and  forgers, 
and  we  must  ask  how  well  we  can  estimate  the  true  error  rates  for  the  larger 
population  based  on  our  particular  sample.  This  leads  to  the  concept  of  con¬ 
fidence  limits,  which  is  discussed  in  Appendix  E.  Basically  if  we  say  that 
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we  have  95  percent  confidence  limits  of  ±1  percent  for  the  Type  I  error  rate, 
this  means  that  we  are  95  percent  certain,  given  the  estimated  Type  I  error 
rate  Ej,  that  the  true  population  error  rate  is  within  the  range  ±1  percent. 
For  example,  if  Ex  a  2  percent  then  the  true  population  error  rate  would  be  in 
the  range  1  to  3  percent,  with  95  percent  confidence. 


5.  Individualized  Feature  Selection 


In  the  preceding  subsection  we  have  discussed  procedures  for  feature 
selection.  These  procedures  can  be  used  to  determine  a  standard  set  of  "best" 
features  to  be  used  for  all  subjects  or  to  derive  a  set  of  best  features  for 
each  subject  individually.  It  is  well  known,  both  theoretically  and  from 
practical  experience,  that  the  use  of  individualized  feature  sets  generally 
yields  better  signature  verification  performance*  (lower  Type  I/Type  II  error 
rates) ,  provided  that  enough  training  data  is  available  to  estimate  the  sets 
with  reasonable  statistical  confidence.  However,  the  use  of  individualized 
feature  sets  requires  a  more  complex  enrollment  procedure,  and  it  is  not  clear 
a  priori  that  the  improved  performance  is  sufficient  to  justify  its  use  for 
some  practical  applications  of  signature  verification.  In  essence,  the  prob¬ 
lem  with  individualized  feature  selection  is  that  a  large  number  of  enrollment 
signatures  is  required  from  each  subject  to  determine  individualized  feature 
sets  with  reasonable  statistical  confidence.  If  a  standard  feature  set  is 
used  (l.e.,  if  the  same  feature  set  is  applied  to  all  subjects)  on  the  order 
of  10  to  12  signatures  are  adequate  for  enrolling  a  subject.  This  seems  very 
practical  and  reasonable  for  a  real-world  signature  verification  system. 
Typically,  to  enroll  in  such  a  system  a  subject  will  sign  five  or  six  signa¬ 
tures  on  two  different  days.  The  requirements  are  quite  different  for  indi¬ 
vidualized  feature  selection.  Although  the  exact  number  of  signatures  needed 
cannot  be  determined  without  knowing  the  exact  probability  structure  of  the 
signature  verification  features  (although  they  are  definitely  non-Gaussian) , 
a  standard  rule  of  thumb  in  such  instances  is  that  the  number  of  independent 
training  (enrollment)  samples  be  several  times  the  number  of  features.  Since 
we  begin  with  44  features,  this  implies  that  the  number  of  enrollment  signa¬ 
tures  should  be  quite  large,  probably  greater  than  100,  although  it  might  be 
possible,  with  less  confidence,  to  make  do  with  40  or  so  (perhaps  even  less  if 
the  set  of  features  from  which  to  choose  is  smaller).  In  any  case,  for  a 
real-world  application,  this  means  that  individualized  feature  selection  may 
require  a  relatively  long  enrollment  procedure.  However,  a  compromise  is  also 
possible  in  which  subject  enrollment  is  based  on  a  standard  feature  set,  and, 
as  more  signatures  are  collected  through  subsequent  verifications,  the  feature 
set  is  gradually  and  automatically  individualized.  But  this  approach  has  its 
own  disadvantage,  that  of  requiring  the  system  to  store,  at  least  temporarily. 


*Thls  is  intuitively  reasonable.  Since  all  subjects  write  differently,  we 
would  expect  their  signatures  to  be  best  characterized  by  somewhat  different 
feature  sets.  For  example,  the  total  time  that  it  takes  to  write  the  signa¬ 
ture  is  a  good  feature  for  subjects  who  are  consistent  in  the  timing  of 
their  signatures  but  a  bad  feature  for  those  who  are  very  Inconsistent  in 
total  writing  time. 
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the  large  number  of  feature  vectors  needed  for  the  process  of  individualized 
feature  selection. 

For  the  current  project  it  was  necessary  to  decide  whether  to  focus  the 
data  base  analysis  on  the  features  technique  using  a  standard  feature  set  or 
individualized  feature  sets  for  all  subjects.  Because  of  the  magnitude  of  the 
data  processing  task  (i.e.,  feature  extraction,  feature  selection,  and  Type  1/ 
Type  II  error-curve  calculations  for  many  thousands  of  signatures)  it  was  not 
possible  to  perform  a  complete  and  exhaustive  analysis  of  both  cases.  For 
reasons  discussed  below,  we  decided  to  emphasize  the  standard  set  of  features 
approach  and  process  only  a  few  problem  subjects  (i.e.,  the  subjects  with  an 
abnormally  high  error  rate)  using  individualized  feature  selection. 

The  data  base  collected  for  the  project  is,  to  the  best  of  our  knowledge, 
the  first  large-scale  data  base  obtained  using  a  three-axis  signature  verifi¬ 
cation  system.  Hence  we  decided  it  was  most  important  to  determine  how  well 
the  basic  signature  verification  system  performed  when  using  a  standard  set 
of  features  for  all  subjects.  This  approach  also  has  the  advantage  that  we 
can  identify  the  small  percentage  of  problem  subjects  from  the  standard  fea¬ 
ture  set  analysis  and  then  apply  the  individualized  feature  selection  process 
to  determine  what  kind  of  improvement  could  be  obtained  for  these  problem  sub¬ 
jects  (discussed  in  IV-A-3) .  If  we  had  started  with  the  individualized  fea¬ 
ture  set  approach,  there  would  have  been  no  way  to  work  backward  to  determine 
how  well  the  system  performed  with  a  standard  set  of  features. 


B.  Correlation  Analysis 

The  features  technique  for  signature  verification  has  the  advantage  of 
simplicity  and  relatively  low  computational  and  template  storage  requirements. 
However,  previous  pilot  studies  indicate  that  the  use  of  more  sophisticated 
template-matching  (i.e.,  verification)  algorithms  can  result  in  substantially 
reduced  Type  I/Type  II  error  rates.  For  high-security  applications  the  poten¬ 
tially  improved  performance  of  a  more  sophisticated  verification  algorithm  may 
outweigh  the  added  complexity  and  computational  requirements.  In  the  follow¬ 
ing  we  describe  SRI's  "rubbery"  correlation  algorithm  for  signature  verifi¬ 
cation. 

In  this  algorithm,  the  P,  X,  and  Y  time-series  force  signals  of  a  test 
signature  are  correlated  mathematically  with  the  appropriate  P,  X,  and  Y 
template  signals.  If  the  correlation  is  greater  than  or  equal  to  a  preas¬ 
signed  threshold  (i.e.,  correlation  value),  the  test  signature  is  judged  a 
true  signature;  if  not,  it  is  judged  a  forgery.  However,  a  direct  mathemati¬ 
cal  correlation  generally  yields  rather  poor  performance  (specifically,  a  high 
Type  I  error  or  true-signer  rejection  rate)  because  of  the  normal  everyday 
variations  in  a  person's  signature.  Even  though  the  P,  X,  and  Y  signals  for 
two  signatures  may  seem  highly  correlated  by  a  subjective  visual  comparison, 
there  are  often  small  time  shifts  within  a  particular  test  signature  that 
cause  significant  misalignment  between  the  prominent  peaks  and  landmarks  of 
the  test  signature  P,  X,  and  Y  signals  and  the  corresponding  template  signals. 
To  compensate  for  the  normal  variations  in  a  sequence  of  true  signatures,  we 
developed  the  technique  of  "rubbery"  correlation,  in  which  an  automatic 
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two-dimensional  search  is  used  to  find  an  optimal  match  (within  appropriate 
search  limits)  between  the  test  signature  and  the  template,  using  various 
combinations  of  time-base  translation  and  time-base  warping  (stretch  and  con¬ 
traction)  of  the  test  signature  P,  X,  and  Y  signals  with  respect  to  the  tem¬ 
plate  signals.  These  procedures  can  be  applied  independently  to  different 
parts  of  the  signature — for  instance,  we  can  partition  the  test  signature  P, 

X,  and  Y  signals  into  halves  and  correlate  each  half  with  the  appropriate 
template  P,  X,  and  Y  signals.  It  is  also  possible  to  use  prominent  landmarks 
(which  are  usually  taken  to  be  the  pen-up  intervals  where  the  P  signal,  or 
downward  pressure,  is  zero  or  close  to  zero)  to  partition  a  signature  into 
smaller  pieces  on  which  to  apply  the  time-warping  algorithms. 

The  basic  concept  of  rubbery  correlation  can  be  illustrated  reasonably 
simply  in  one  dimension  (instead  of  three  dimensions  as  is  really  the  case 
when  using  the  SRI  pen) :  Let  the  template  signal  be  represented  as  the  vector 
?(t)  whose  components  are  the  discrete  sampled  values  of  one  of  the  analog 
signals  obtained  from  a  reference  signature  or  template. 


T+(t)  -  [Tr  T2,  T3 . .  TJ 


where  t  indicates  the  vector  transpose  and  n  is  the  total  number  of  discrete 
samples.  T^  is  the  value  of  the  template  signal  at  time  1,  T2  is  the  value 
at  time  2,  and  so  on. 

Let  the  test  signal  V(t)  (obtained  from  a  signature  that  is  to  be  veri¬ 
fied)  be  represented  as 


VT(t)  -  [V. 


V 


The  standard  Pearson  correlation  coefficient  is  defined  as* 


C[T(t) ,  V(t)J 


_ n  £t1v1  -  ZT±  Lvj _ 

[■E(*i2)  -(£*1)^  [»£(vi2)  -(£Ti)2f 


The  expression  for  correlation  presented  above  is  convenient  for  the  purposes 
of  explanation  because  its  calculated  values  must  lie  between  +1  and  -1,  where 
+1  and  -1  are  the  maximum  positive  and  negative  correlations  and  0  is  no  cor¬ 
relation.  In  practice  there  are  more  efficient  ways  to  compute  correlation 
if  the  -1  to  +1  normalization  is  not  required. 


*T.  W.  Anderson,  An  Introduction  to  Multivariate  Statistical  Analysis  (New 
York:  Wiley,  1958),  p.  49. 
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When  V(t)  is  correlated  against  T(t),  we  judge  V(t)  to  represent  a  true 
signature  if  the  calculated  correlation  is  larger  than  a  preselected  positive 
number,  typically  in  the  range  0.7  to  1.0.  However,  because  true  signers  have 
some  variability  in  their  signatures,  the  correlation  calculations  must  be 
made  more  flexible  to  allow  a  reasonable  range  of  phase,  amplitude,  and  time 
variations.  This  can  be  done  by  computing  the  correlation  function 


C[T(t),  V (kt  +  t0)] 


for  an  allowed  range  of  translation  (i.e.,  for  various  to  in  the  above  equa¬ 
tion)  and  stretching  or  shrinking  (i.e.,  for  various  values  of  the  multiplica¬ 
tive  constant  k) .  The  highest  correlation  over  a  specified  range  of  discrete 
values  of  k  and  tQ  is  thus  obtained.  If  this  correlation  is  larger  than  a 
specified  value,  the  test  signature  is  judged  to  be  a  true  signature.  Further 
flexibility  is  obtained  by  breaking  the  signature  into  pieces,  either  in  fixed 
proportions  such  as  halves  or  by  using  signal  landmarks  such  as  pen-ups,  cor¬ 
relating  each  piece  allowing  for  the  k  and  tQ  variations  described  above,  and 
combining  them  into  a  total  correlation  coefficient. 

The  procedure  for  calculating  the  Type  I/Type  II  error  curves  for  the 
rubbery  correlation  signature  verification  algorithm  is  essentially  the  same 
as  described  in  III-A-4  for  the  features  technique.  The  only  difference  is 
that  the  measure  of  closeness  between  a  test  signature  and  the  template  is  now 
the  rubbery  correlation  rather  than  the  Euclidean  distance  metric. 


C.  Features  Analysis  (for  Numeric  Sequence  Data) 

The  objective  of  collecting  and  analyzing  handwritten  numeric  sequences 
was  to  determine  how  well  subjects  could  be  discriminated  on  the  basis  of 
handwritten  samples  of  the  same  set  of  characters.  As  mentioned  earlier, 
this  is  an  identification  problem  rather  than  a  verification  problem  because 
it  is  assumed  that  the  subject  makes  no  a  priori  claim  as  to  his  identity. 

In  verification,  the  subject  makes  an  a  priori  identity  claim  and  the  test 
sample  is  compared  only  against  the  computer-stored  template  (or  reference) 
corresponding  to  the  claimed  identity.  In  identification,  the  subject  writes 
a  test  sample  that  is  compared  against  the  templates  of  all  the  subjects  to 
establish  his  identity. 

Our  analysis  of  the  numeric  sequences  is  based  on  tfie  44  features  de¬ 
scribed  in  III-A-1.  The  set  of  44  features  was  extracted  from  each  of  the 
1,740  numeric  sequences  in  the  data  base  (see  II-A  for  a  description  of  that 
data  base)  using  our  PDP  11/40  and  written  to  magnetic  tape.  A  computer  pro¬ 
gram  was  written  to  translate  this  tape  into  a  format  compatible  with  SRI's 
CDC  6400  computer.  The  feature  data  was  then  analyzed  using  the  Statistical 
Package  for  the  Social  Sciences  (SPSS)  supported  by  the  CDC  6400. 

The  SPSS  was  used  because  it  is  ideal  for  the  kind  of  identification  or 
classification  problem  posed  by  the  numeric  sequences.  The  specific  program 
used  for  the  current  analysis,  known  as  DISCRIMINANT,  is  based  on  standard 
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discriminant  analysis  procedures  for  classifying  unknown  test  samples  into 
one  of  many  groups.  Since  this  program  is  very  well  documented*  and  is  avail¬ 
able  on  most  large-scale  computers,  we  will  not  discuss  it  in  detail  here. 

The  results  of  the  identification  analysis  are  given  in  IV-C. 


*N.  H.  Nie  et  al.,  SPSS,  2nd  ed.  (New  York:  McGraw-Hill,  1975). 

M.  J.  Norusis,  "SPSS  Statistical  Algorithms  (Release  8.0),"  Computer  Soft¬ 
ware  for  Data  Analysis,  Suite  3300,  444  N.  Michigan  Ave. ,  Chicago,  Illinois 
60611. 
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IV  PERFORMANCE  EVALUATION 


In  Section  III  we  described  the  signature  verification  algorithms  and 
data  base  analysis  procedures.  In  TV-A  we  summarize  the  results  of  the  data 
base  analysis,  in  terms  of  the  Type  I/Type  II  error  curves  and  average 
access  (or  verification)  time,  for  the  features  techniques  using  a  standard 
feature  set  that  is  the  same  for  all  subjects.  For  typical  conditions  the 
equal-error  rate*  is  on  the  order  of  one  percent.  In  IV-A-3  and  IV-B  we  show 
the  improvement  in  performance  that  may  be  obtainable  by  using  individualized 
feature  selection  and  the  rubbery  correlation  algorithm,  respectively.  In 
IV-C  we  present  the  results  of  a  subject  identification  study  based  on  a 
sequence  of  handwritten  numerals,  and  in  IV-D  we  discuss  the  human  engineering 
aspects  of  the  process  (l.e.,  how  the  subjects  felt  about  using  the  system). 


A.  Features  Technique  for  Signature  Verification 

The  procedure  for  selecting  features  and  estimating  the  Type  I/Type  II 
error  curves  was  discussed  in  III-A-3  and  III-A-4,  respectively.  The  set  of 
4A  features  (descriptors  of  the  P,X,Y  force  signals  generated  by  the  SRI  pen 
during  the  writing  of  a  signature)  used  in  the  analysis  was  also  described  in 
III-A-3.  In  this  subsection  (IV-A-1)  we  begin  by  deriving  the  average  time 
required  for  verification  (i.e.,  the  average  access  time).  In  IV-A-2  we 
present  Type  I/Type  II  error  curves  based  upon  a  standard  set+  of  "best" 
features  derived  from  the  original  set  of  44  features. 


1.  Access  Time 

The  average  signature  length  of  the  58^  data  base  subjects  is  5.7  seconds. 
Added  to  this  is  a  1.5  second  delay  that  is  used  to  determine  when  the  signa¬ 
ture  has  been  completed  (i.e.,  no  writing  for  1.5  seconds  Indicates  the  signa¬ 
ture  is  over)  and  a  processing  time  of  0.5  seconds.  The  processing  time 
varies  with  the  length  of  the  signature,  and  we  have  taken  a  worst-case  esti¬ 
mate.  The  signature  verification  system  allows  up  to  three  tries  (signatures) 


As  discussed  later  in  more  detail,  the  equal-e».ror  rate  is  the  error  rate  at 
which  the  Type  I/Type  II  error  curves  intersect  (i.e.,  where  Type  I  error  * 
Type  II  error) . 

*By  a  standard  set  of  features  we  mean  a  single  set  of  features  that  is  used 
for  all  subjects. 

^Subject  PER  was  excluded  from  the  data  base  analysis  because  other  commit¬ 
ments  prevented  him  from  participating  for  the  full  length  of  the  data  col¬ 
lection  period,  and  too  few  signatures  of  his  were  available  to  be  analyzed. 
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per  verification  trial,  but  the  analysis  shows  that  on  the  average  only  1.1 
attempts  were  required.  The  average  access  time  is  thus  estimated  to  be 

Average  access  time  -  (5.7  +  1.5  +  0.5)  *  1.1 

**  8. 5  seconds 


2 .  Type  I/Type  II  Error  Curves 

In  this  section  we  present  Type  I/Type  II  error  curves  beginning  with  the 
so-called  "Trues  vs.  Trues"  error  curves.  These  curves  are  calculated  based 
on  the  following:  The  known  true  signatures  of  a  particular  subject,  say 
subject  ABC,  are  compared  against  his  own  template.  The  percent  rejection  as 
a  function  of  the  decision  threshold  is  the  Type  I  error  curve.*  The  Type  II 
error  is  calculated  by  comparing  the  true  signatures  of  all  the  other  subjects 
against  the  ABC  template.  The  percent  accepted  as  a  function  of  decision 
threshold  is  the  Type  II  error  curve.  This  procedure  is  then  repeated  for  all 
subjects  in  the  data  base.  Clearly  the  Trues  vs.  Trues  error  rate  is  a  kind 
of  confusion  rate,  comparable  to  the  situation  in  which  one  subject  claims 
the  identity  of  another  subject  but  attempts  to  use  his  own  signature  for  veri¬ 
fication.  However,  this  is  not  a  very  realistic  measure  of  the  system's 
Type  I/Type  II  error  curves  and  is  included  here  only  because  this  type  of 
error-rate  calculation  is  very  common  in  the  literature.  Following  the  pre¬ 
sentation  of  Trues  vs.  Trues  Type  I/Type  II  error  curves,  we  present  the 
Trues  vs.  Attempted  Forgeries  Type  I/Type  II  error  curves.  In  this  case  the 
Type  I  error  curves  are  calculated  in  the  same  way  as  the  above,  but  the 
Type  II  error  curves  are  computed  using  attempted  forgery  data. 


a.  Trues  vs.  Trues 


The  initial  set  of  44  features  was  described  in  III-A-2.  To  select  a 
"best"  subset  of  the  44  features  we  began  by  dividing  the  signature  data  into 
two  sets,  a  training  set  and  a  testing  set.  Because  we  collected  an  equal 
number  of  sitting  and  standing  signatures, +  a  natural  division  was  made  on 
this  basis.  To  begin  we  used  the  sitting  signature  data  as  the  training  set 
on  which  feature  selection  was  performed  in  order  to  determine  a  best  subset 
of  the  44  original  features  (l.e.,  the  subset  that  yields  the  least  error 
rate).  Using  the  feature  selection  method  described  in  III-A-3,  the  best  sub¬ 
set  consisted  of  Features  1,  2,  3,  6,  11,  12,  13,  14,  16,  20,  22,  25,  26,  27, 
28,  29,  30,  32,  33,  38,  40,  41,  42,  43,  and  44.  These  features  are  described 
in  III-A-2.  The  standing  data  was  then  used  to  calculate  the  Type  I /Type  II 
error  curves,  the  result  of  which  is  shown  in  Figure  6.  To  compare  results 
we  will  use  the  point  at  which  the  Type  I/Type  II  errors  are  equal  (i.e.,  the 


*See  III-A-4  for  more  details.  Recall  that  the  Type  I/Type  II  errors  are  cal¬ 
culated  based  upon  allowing  three  tries  (signatures)  per  verification  trial. 

^Signatures  were  obtained  from  subjects  both  sitting  down  at  a  table  and 
standing  at  a  counter.  See  Section  II  for  details. 
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percent  error  where  the  curves  Intersect) ,  which  is  called  the  equal-error 
rate.  This  equal-error  rate,  indicated  by  the  horizontal  dashed  line  in 
Figure  6,  is  about  1.5  percent  for  the  standing  data.  To  cross-validate  the 
results  we  reversed  the  roles  of  the  sitting  and  standing  data.  In  this  case 
the  standing  data  were  used  to  train  (i.e.,  for  feature  selection)  and  the 
sitting  data  for  error-rate  calculation.  The  feature  set  selected  using  the 
standing  data  was  the  same  as  had  been  derived  using  the  sitting  data.  The 
Type  I/Type  II  error  curves  for  the  sitting  data  are  shown  in  Figure  7.  Com¬ 
parison  of  Figures  6  and  7  show  the  error  curves  to  be  essentially  identical, 
so  the  cross-validation  yielded  very  consistent  results,  which  gives  us  added 
confidence  in  the  results.  It  may  also  be  concluded  that  there  is  essentially 
no  difference  in  performance  whether  the  subject  is  sitting  or  standing  when 
he  writes. 


b.  Trues  vs.  Attempted  Forgeries 

For  the  Trues  vs.  Attempted  Forgery*  Type  I/Type  II  error-curve  calcula¬ 
tions  we  decided  to  use  the  same  set  of  best  features  that  had  been  used  for 
the  Trues  vs.  Trues  calculations.  The  reason  for  this  is  that  the  generality 
of  the  forgery  data  is  uncertain  because  very  little  is  known  about  the 
forger  population.  In  any  case,  the  use  of  the  Trues  vs.  Trues  feature  set 
is  a  conservative  approach,  and  there  is  no  question  of  testing  and  training 
on  the  same  data  set. 

The  Type  I/Type  II  error  curves  for  the  standing  true-signature  data 
versus  the  attempted  forgery  data  are  shown  in  Figure  8.  The  equal-error  rate 
is  approximately  2.25  percent,  somewhat  worse  than  the  1.5  percent  equal-error 
rate  of  the  Trues  vs.  Trues  data.  The  Type  I/Type  II  error  curves  for  the 
sitting  Trues  vs.  Attempted  Forgeries  is  shown  in  Figure  9.  The  equal-error 
rate  is  almost  3  percent. 

Data  analysis  showed  that  almost  all  the  forgeries  occured  for  the  two  or 
three  true  signers  who  were  the  most  inconsistent  in  writing  their  signatures. 
A  simple  enrollment  criterion  based  on  the  total  variance  of  the  template  was 
subsequently  tested.  If  the  combined  standard  deviation  was  larger  than  some 
assigned  threshold,  the  subject  failed  the  enrollment  criteria  and  was  ex¬ 
cluded.  This  resulted  in  the  exclusion  of  three  subjects  out  of  58  and 
yielded  considerable  improvement  in  signature  verification  performance.* 
Figures  10  and  11  show  the  Type  1/Type  II  error  curves  (for  standing  and 
sitting  data,  respectively)  when  this  enrollment  criterion  is  used.  The 


In  attempting  to  forge  a  signature  each  forger  was  allowed  up  to  18  tries, 
nine  before  viewing  the  video  tapes  and  nine  after.  Because  we  found  that 
there  is  only  a  slight  difference  in  the  error  rates  for  the  two  conditions, 
the  Type  II  error  curves  presented  in  this  section  are  calculated  using  the 
combined  set  of  forgery  attempts. 

^This  behavior  is  typical  of  verification  systems.  Usually  most  of  the  errors 
are  contributed  by  a  very  small  percentage  of  system  users. 
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PERCENT  ERROR  C  PERCENT  ERROR 


RE  8  TYPE  I/TYPE  II  ERROR  CURVES  FOR  TRUES  VERSUS  ATTEMPTED  FORGERIES, 
STANDING  DATA,  NO  ENROLLMENT  CRITERIA 


FIGURE  9  TYPE  I/TYPE  II  ERROR  CURVES  FOR  TRUES  VERSUS  ATTEMPTED  FORGERIES, 
SITTING  DATA,  NO  ENROLLMENT  CRITERIA 
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FIGURE  10  TYPE  I/TYPE  II  ERROR  CURVES  FOR  TRUES  VERSUS  ATTEMPTED  FORGERIES. 
STANDING  DATA,  WITH  ENROLLMENT  CRITERIA 
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DECISION  THRESHOLD 


FIGURE  1 1  TYPE  1/  TYPE  II  ERROR  CURVES  FOR  TRUES  VERSUS  ATTEMPTED  FORGERIES. 
SITTING  DATA,  WITH  ENROLLMENT  CRITERIA 
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equal-error  rates  are  reduced  to  1.75  percent  for  the  Trues  vs.  Forgeries 
(standing)  and  to  1.25  percent  for  the  Trues  vs.  Forgeries  (sitting).  By  mak¬ 
ing  the  enrollment  criteria  even  more  stringent,  to  where  six  or  seven  of  the 
most  inconsistent  subjects  (out  of  58)  are  excluded,  the  equal-error  rates  are 
on  the  order  of  0.5  to  0.75  percent. 

Tests  were  also  made  of  the  effect  of  allowing  the  forgers  to  view  video 
tapes  of  the  true  signers  writing  their  signatures.  The  error  rate  was 
slightly  worse  when  the  forgers  were  allowed  to  view  the  video  tapes,  which 
implies  that  the  forger  can  learn  something  of  the  signature  dynamics  by 
closely  observing  the  true  signer.  The  total  effect,  however,  was  not  partic¬ 
ularly  significant. 


3.  Performance  Results  for  Individualized  Feature  Sets 


As  discussed  earlier,  improved  performance  can  be  expected  when  individu¬ 
alized  feature  sets  are  used.  In  this  section  we  show,  by  example,  the  kind 
of  improvement  that  can  be  expected.  Of  all  the  data  base  subjects,  CMS  was 
the  worst  in  the  sense  of  contributing  the  most  to  the  Type  I/Type  II  error 
rates.  In  Figure  12,  the  solid  lines  are  the  Type  I/Type  II  error  curves  for 
subject  CMS's  true  signatures  vs.  attempted  forgeries  using  the  standard 
feature  set  described  in  IV-A-2.  The  equal-error  rate  is  over  6  percent. 


FIGURE  1 2  TYPE  I/TYPE  II  ERROR  CURVES  FOR  SUBJECT  CMS  FOR  THE  STANDARD 
FEATURE  SET  AND  FOR  AN  INDIVIDUALIZED  FEATURE  SET 


The  individualized  feature  set  for  CMS,  which  was  derived  using  the  method 
described  in  III-A-3,  consisted  of  Features  1,  2,  6,  11,  13,  16,  26,  27,  32, 
38,  40,  and  44.*  The  Type  I/Type  II  error  curves  for  subject  CMS  using  this 
individualized  feature  set  are  given  by  the  dashed  curves  in  Figure  12. 

Note  the  substantial  improvement  compared  to  the  Type  I/Type  II  error  curves 
for  the  standard  feature  set.  In  fact,  for  the  individualized  feature  set 
there  is  no  cross-over  at  all  of  the  Type  I/Tvpe  II  error  curves,  and  so 
the  equal-error  rate  is  zero.  However,  this  is  based  on  a  small  amount  of 
data  (one  subject's  true  signatures  and  the  associated  attempted  forgeries) 
and  it  would  not  be  appropriate  without  extensive  further  testing  to  con¬ 
clude  that  individualized  feature  selection  would  yield  a  Type  I/Type  II 
error  rate  of  zero.  However,  based  on  this  result  and  previous  experience, 
we  believe  (but  have  not  proven)  that  a  conservative  statement  of  the  im¬ 
provement  which  could  be  expected  from  individualized  feature  selection  is 
that  the  equal  error  rate  would  be  reduced  by  at  least  a  factor  of  two 
(i.e.,  the  equal-error  rate  would  be  on  the  order  of  0.5  percent  or  better 
rather  than  the  1  percent  as  given  in  the  preceding  subsection) . 


B.  Correlation  Technique  for  Signature  Verification 

The  rubbery  correlation  algorithm  for  signature  verification  was  de¬ 
scribed  in  III-B.  Because  of  time  limitations  and  the  fact  that  our  PDP  11/40 
computer  was  down  with  hardware  problems  for  more  than  two  months,  we  were 
unable  to  process  the  entire  data  base  using  the  correlation  algorithm.  How¬ 
ever,  the  main  question  is  whether  the  rubbery  correlation  technique  is  more 
effective  than  the  features  technique  for  signature  verification.  To  answer 
this  question  we  processed  true  vs.  attempted  forgery  data  for  those  subjects 
for  which  the  features  technique  yielded  relatively  poor  performance.  As 
discussed  in  IV-A-3,  subject  CMS  contributed  a  high  percentage  (more  than 
6  percent)  of  the  errors  that  occurred  with  the  features  technique.  For  the 
same  set  of  data,  subject  CMS's  Type  I/Type  II  error  curves  for  the  rubbery 
correlation  signature-verification  algorithm  are  shown  In  Figure  13.  These 
results  may  be  compared  with  the  Type  I/Type  II  error  curves  (indicated  by  the 
solid  lines  in  Figure  12)  for  the  features  technique.  There  is  no  overlap  in 
the  curves  in  Figure  13  and  so  the  equal  error  rate  is  0,  a  dramatic 
improvement . 

Although  we  were  not  able  to  process  enough  data  with  the  correlation 
algorithm  to  give  a  statistically  confident  estimate  of  the  Type  I/Type  II 
error  curves,  our  tests  with  some  of  the  problem  subjects,  such  as  that  for 
CMS  described  above,  suggests  very  strongly  that  the  correlation  technique  is 
substantially  superior  to  the  features  technique  for  signature  verification. 

As  noted  earlier,  the  problem  with  individualized  feature  selection  is 
the  requirement  for  a  long  enrollment  period  with  many  signatures.  However, 
this  is  not  a  problem  for  the  correlation  algorithm.  The  results  for  subject 


These  features  are  described  in  III-A-2. 
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FIGURE  13  TYPE  I/TYPE  II  ERROR  CURVES  FOR  SUBJECT  CMS  (TRUES  VERSUS  ATTEMPTED 
FORGERIES)  FOR  THE  "RUBBERY''  CORRELATION  SIGNATURE-VERIFICATION 
ALGORITHM 


CMS  described  above  were  based  on  using  only  the  first  nine  signatures  for 
enrollment.  The  only  disadvantage  of  the  correlation  technique,  compared  to 
the  features  technique,  is  somewhat  increased  processing  time  and  increased 
computer  storage  requirements  for  the  subject  templates.  For  high-security 
applications,  these  disadvantages  are  probably  not  very  important. 


C.  Features  Technique  for  Subject  Identification  Based  on  a  Handwritten 
Sequence  of  Five  Numerals 

In  this  section  we  present  the  results  of  the  analysis  of  the  handwritten 
numeric  sequence  data  base  using  the  SPSS  program  DISCRIMINANT.*  The  SPSS 
control  file-*"  used  for  the  data  analysis,  which  is  shown  in  Figure  14,  was  set 


See  III-C  for  the  reasons  that  we  chose  to  use  the  SPSS  programs  for  the 
numeric  sequence  analysis,  as  well  for  references  relating  to  program  docu¬ 
mentation  and  data  analysis  algorithms. 

^The  use  of  the  SPSS  control  file  and  the  many  program  options  is  described  in 
detail  in  N.  H.  Nie,  et  al.,  SPSS,  2nd  ed.  (New  York:  McGraw-Hill,  1975). 
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RUN  NAME 
VARIABLE  LIST 
INPUT  MEDIUM 
INPUT  FORMAT 


VALUE  LABELS 


N  OF  CASES 
SEED 
COMPUTE 
IF 

WEIGHT 

PRINT  FORMATS 
LIST  CASES 


RADC  SIGNATURE  DATA,  FEATURES  SELECTED  U-MAY-81  AT  06:46:3s 
AUTHOR, SEQUENCE, POSIT  I ON, TRUEFORG, FEATUR01  TO  FEATUR44 
DISK 

FIXED(2F3.0,2F2.0,7FI0.3  ,  /, 1OX.7F10.3,  /, I0X,7FI0.3 

,  /, I0X.7FI0.3,  /, 10X.7F10.3,  /,10X,7F10.3 
,  /, 10X.2F10.3) 

AUTHOR  (  DAAF  (  2)AEP  <  3>AEV  <  4)ASI  <  5)BEP  (  6)BJG 

(  8)CBW  (  9) CEP  <  10) CMS  (1DDEP  (12)DRB 

< 14) ELF  (IS) EMU  <16>FET  (17)FJM  <18)FLL 
<21> GEW  (22) HEP  <23>HFS  (24)JCZ 

(27) JEP  (28) JJS  (29) JLP  (30>JNH 

(33)KES  (34) LAL  (35>LEL  (36)MAB 

(39)MFA  ( 40 ) MRC  (4D0EK  (42)PER 

(45)PLH  (46>RAB  (47)RTK  (48)RWH 

(51 )SDJ  ( 52 > SEA  <53>SEC  (54)SEM 

( 57 ) TPP  (581TSS  (S91VKR/ 

POSITION 
TRUEFORG 
UNKNOWN 
STANDARD 
WGTVAR* 1 

(UNlFORM(l)  LE  0.5)  WGTVAR-0 
WGTVAR 

FEATURO 1 , FEATUR 1 1 , FEATUR2 1 . FEATUR4 1 .FEATUR44  (3) 

CASES* 100/ 

VAR I ABLES-AUTHOR , SEQUENCE .POSITION .TRUEFORG 

, FEATURO 1 , FEATUR 1 1 , FEATUR2 1 . FEATUR4 1 , FEATUR44/ 


(  DAAF 
(  7)CAU 
( 13) DWV 
(19) GAN 
(25) JEE 
(31 >  JRL 
(37) MAN 
(43)PES 
(49)RKR 
( 55 ) SRW 


( 20 ) GEG 
(26) JEM 
( 32 ) KCN 
( 38 ) MER 
( 44) P JP 
(50) SAW 
(56) TDK 

<0)STAND  (DS1T/ 
(0)TRL’E  ( 1 )  FORGER/ 


READ  INPUT  DATA 

DISCRIMINATE  GROUPS* AUTHOR ( 1 , 59) / 

VAR1ABLES=FEATUR01  TO  FEATUR44/ 
ANALYSIS=FEATUR01  TO  FEATUR44/ 
METHOD* DIRECT/ 

PRIORS-EQUAL/ 

OPTIONS  5,6,10,11,12.20 

STATISTICS  1,2, 3, 4, 6 


FIGURE  14  SPSS  CONTROL  FILE 

up  so  that  the  DISCRIMINANT  proram  used  approximately  half  (by  random  selec¬ 
tion)  of  the  1,740  numeric  sequences  in  the  data  base  for  training  (i.e.  ,  to 
estimate  the  discriminant  functions)  and  the  other  half  for  testing  (error- 
rate  calculations) . 

The  basic  result  was  that  90.4  percent  of  the  numeric  sequences  in  the 
testing  data  set  were  classified  correctly;  that  is,  90.4  percent  of  the  time 
a  subject  was  identified  correctly  based  upon  a  single  handwritten  numeric 
sequence.  This  recognition  rate  can  be  improved  by  allowing  the  subject  to 
try  again  if  his  first  handwritten  sequence  fails  to  identify  him  correctly.* 
The  95  percent  confidence  limits  on  the  90.4  percent  recognition  rate  are 
±2  percent."*" 


For  example,  assuming  independence,  the  recognition  rate  allowing  two  trials 
would  be  99.1  percent. 

^90.4  percent  is  an  estimate  of  the  true  recognition  rate  for  the  population. 
The  confidence  limits  simply  state  that  we  are  95  percent  sure,  given  the 
estimate  of  90.4  percent  calculated  from  the  data,  that  the  true  population 
recognition  rate  is  between  88.4  and  92.4  percent. 
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Figure  15  presents  a  summary  of  the  Identification  results  on  a  subject- 
by-subject  basis.  The  vertical  column  of  initials  is  the  actual  author  of 
the  numeric  sequence  and  the  horizontal  row  of  initials  along  the  top  gives 
the  initials  of  the  subject  identified  as  the  author  (which  may  or  may  not 
correspond  to  the  true  author,  depending  on  the  success  of  the  identification 
process).  For  example,  the  first  subject  on  the  vertical  column  of  subject 
initials  is  AAF.  Looking  across  that  row  we  see  that  17  numeric  sequences  of 
AAF  were  tested  and  all  correctly  identified  as  having  been  written  by  AAF; 
all  17  responses  are  listed  under  the  column  headed  AAF.  Similarly,  there 
were  12  total  numeric  sequences  tested  for  subject  AEP  and  all  12  of  them  were 
identified  correctly  as  having  been  written  by  AEP.  Reading  across  the  row 
for  AEW,  we  see  that  there  was  a  total  of  15  numeric  sequences  tested.  Of 
these,  14  were  identified  correctly  and  one  was  incorrectly  identified  as 
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FIGURE  15  SUMMARY  OF  IDENTIFICATION  RESULTS  SUBJECT  BY  SUB'ECT 
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A. 


having  beem  written  by  subject  AAF.  If  the  recognition  rate  were  100  percent, 
there  would  be  no  off-diagonal  terms  in  Figure  15.  Only  about  half  of  the 
subjects  are  listed  in  Figure  15  because  a  59  «  59*  identification  table  would 
not  fit  on  one  page.  The  intent,  in  any  case  was  not  to  exhaustively  list  all 
the  subject-by-subject  identification  results  but  to  present  an  example  of  how 
the  identification  results  were  distributed. 


D.  Human  Engineering  and  User  Acceptance 

Among  the  subjects  polled,  there  were  only  two  minor  complaints  concern¬ 
ing  the  signature  verification  system.  The  first  of  these  was  that  it  was 
difficult  to  see  what  one  was  writing  because  of  the  relatively  large  cylindri¬ 
cal  structure  at  the  writing  end  of  the  pen.  The  second  had  to  do  with  the 
wire  attached  to  the  pen.  However,  all  subjects  adapted  very  quickly,  and 
these  problems  did  not  affect  the  system's  operation. 


All  59  subjects  were  used  in  this  part  of  the  analysis. 
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V  SUMMARY 


In  previous  sections  we  described  the  data  base  and  data  collection  proto¬ 
col,  the  signature  verification  algorithms  and  associated  data  base  analysis 
procedures,  and  the  results  of  the  performance  analysis.  The  performance 
analysis  section  presented  estimates  of  the  access  time  and  Type  1/Type  II 
error  curves  for  three  signature  verification  algorithms*  under  a  variety  of 
conditions.  In  this  section  we  provide  a  summary  of  the  most  essential  results 
of  the  performance  analysis. 

The  average  access  time  was  8.5  seconds,  dominated  by  the  time  required 
to  write  signatures.  Type  I/Type  II  error  curves  for  the  features  technique 
using  a  standard  feature  set  for  all  subjects  was  shown  in  Figure  11.  The 
equal-error  rate  (the  percent  error  at  the  point  where  the  Type  I/Type  II 
curves  intersect)  is  slightly  more  than  1  percent.  These  curves  were  calcu¬ 
lated  using  attempted  forgery  data  and  all  the  true  signatures  in  the  data 
base  collected  with  the  subjects  sitting  at  a  table.  The  648  attempted  forg¬ 
eries  were  obtained  from  trained  forgers  who  were  given  copies  of  the  true 
signers'  signatures,  instructed  in  how  the  signature  verification  system 
worked  and  what  it  measured,  allowed  to  watch  video  tapes  of  the  true  signers 
writing  their  signatures,  and  allowed  to  practice  as  much  as  they  desired 
over  a  three-week  period.  Enrollment  criteria,  based  on  the  variance  of  the 
template,  were  Imposed  so  that  subjects  who  were  extremely  variable  in  writing 
their  signatures  were  not  accepted  by  the  system.  Of  the  59  subjects  in  the 
data  base,  only  three  were  unable  to  meet  these  enrollment  criteria. 

We  believe  that  the  Type  I/Type  II  error  curves  in  Figure  11  provide  a 
realistic  and  probably  conservative  estimate  (i.e.,  slightly  worse  than  it 
really  should  be)  of  system  performance  for  the  following  reasons: 

•  The  same  feature  set  was  used  for  all  subjects. 

•  Careful  separation  of  testing  and  training  data  was  always  maintained. 

•  The  analysis  simulated  a  real-world  enrollment  procedure  in  which  only 
a  few  signatures  were  available  from  which  to  construct  the  templates. 

•  In  a  real-world  signature-verification  application,  a  subject  risks 
being  denied  access  if  he  is  cafeless  or  sloppy  in  writing  his  signa¬ 
ture,  but  there  was  no  comparable  motivation  for  the  subjects  to 
cooperate  in  the  type  of  data  collection  effort  described  here.  In 
an  attempt  to  provide  at  least  some  motivation,  cash  prizes  were 
offered  for  the  most  consistent  signatures,  but  in  practice  this  was 


The  three  signature  verification  algorithms  were  a  features  technique  based 
on  a  standard  feature  set  (i.e.,  a  single  best  set  of  features  for  all  sub¬ 
jects  collectively),  a  features  technique  based  upon  individualized  feature 
sets  (i.e.,  a  best  feature  fet  derived  for  each  subject  Individually),  and  a 
"rubbery"  correlation  algorithm. 
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not  greatly  successful.  The  lack  of  motivation  led  to  increasing 
signature  variances  toward  the  end  of  the  data  collection  period  for 
most  subjects,  which  probably  caused  some  overestimation  of  the 
system's  error  rates. 

Signature  verification  based  on  the  features  technique  but  with  individ¬ 
ualized  feature  selection  was  also  considered.  This  algorithm  was  tested 
using  the  problem  subjects  (the  few  subjects  in  the  data  base  who  caused  es¬ 
sentially  all  the  errors)  and  yielded  substantially  improved  performance 
compared  to  the  features  technique  based  on  a  standard  feature  set  for  all 
subjects  (on  which  the  results  of  Figure  11  are  based).  Although  we  were 
unable,  because  of  time  limitations,  to  process  enough  data  to  obtain  a  sta¬ 
tistically  confident  estimate  of  the  Type  I/Type  II  error  curves  for  individ¬ 
ualized  feature  selection,  based  on  the  results  of  our  limited  testing  with 
problem  subjects  and  our  previous  experience,  we  believe  that  the  equal-error 
rate  is  probably  at  least  a  factor  of  two  better  than  for  the  features 
technique  using  a  standard  feature  set  for  all  subjects.  The  primary  dis¬ 
advantage  of  individualized  feature  selection  is  that  it  uay  require  a 
relatively  large  numer  of  enrollment  signatures. 

Finally,  the  "rubbery"  correlation  algorithm  was  also  tested  using  the 
problem  subject's  data.  Compared  to  the  features  technique  based  on  a  standard 
set  for  all  users,  there  was  a  dramatic  reduction  in  error  rate.  However,  as 
was  the  case  for  the  individualized  feature  selection  technique,  because  of 
time  limitations  we  were  unable  to  process  enough  data  to  provide  a  statisti¬ 
cally  confident  estimate  of  the  overall  system  Type  I/Type  II  error  curves 
for  the  rubbery  correlation  algorithm.  This  procedure  used  only  nine  signa¬ 
tures,  comparable  to  that  required  for  the  standard  features  technique.  The 
only  disadvantage  of  the  rubbery  correlation  technique  is  that  it  requires  more 
processing  time  and  computer  storage  for  subject  templates. 

In  sum,  dynamic  signature  verification  based  on  a  three-axis  pen  system 
yields  equal-error  rates  on  the  order  of  one  percent  using  a  features  algo¬ 
rithm  and  a  standard  set  of  features  for  all  subjects.  Analysis  of  a  limited 
data  set  indicates  that  a  substantial  reduction  in  error  rate  can  be  obtained 
by  individualized  feature  selection  or  rubbery  correlation  algorithms,  but  at 
the  cost  of  an  increased  computational  burden.  These  are  promising  areas  for 
future  development. 
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DESCRIPTION  OF  THE  MAGNETIC  TAPE  CONTAINING 
THE  SIGNATURE  VERIFICATION  DATA  BASE 


DESCRIPTION  OF  THE  MAGNETIC  TAPE  CONTAINING 
THE  SIGNATURE  VERIFICATION  DATA  BASE 


A  summary  of  the  true  signature,  numeral,  and  forgery  data  was  given  in 
Section  II.  This  data  is  stored  on  tape  RADC  SIGNATURE  DATA  BASE.  The  tape 
was  generated  using  PIP  (the  version  that  supports  magnetic  tape  reading  and 
writing)  on  our  PDP  11/40.  It  is  a  nine-track,  1600-CPI  tape,  with  volume 
label  *  JSO.  To  read  this  tape  on  the  PDP  11/70  under  RSX  11-M  the  following 
steps  are  required: 

1.  Mount  the  tape  on  a  1600  CPI,  nine-track  tape  drive. 

2.  Allocate  MT:  (MCR  >  ALL  MT:). 

3.  Mount  the  tape  (MCR  >  MOU  MT:JSO). 

4.  PIP  can  now  be  used  to  copy  the  data  on  the  magnetic  tape  to  the 
system  disk  (or  some  other  disk).  For  example,  to  copy  file 
TEST.FTN  to  disk  DR0:  use 

PIP>DR0  -  MT:TEST.FTN/BS:8192. 

t — must  include  period 

Note:  Not  all  versions  of  PIP  read  from  device  MT:  properly, 
so  the  correct  version  must  be  used. 

The  true  signatures  for  a  subject  are  stored  sequentially,  one  signature  per 
record,  in  a  file  of  the  form 

TABCXABC. DAT ; 1 

where  ABC  is  the  initials  of  a  particular  subject.  Since  there  were  59  sub¬ 
jects  in  the  data  base,  there  are  59  such  files  on  the  tape. 

Similarly,  there  are  59  files  of  numeral  data  for  each  subject.  These 
are  of  the  form 

NNUMXABC.DAT;1 

The  attempted  forgery  data  is  of  the  form 

FABCXDEF.DAT; 1 

where  ABC  are  the  initials  of  the  true  signer  and  DEF  are  the  initials  of  the 
forger. 


A  test  program,  TEST. FTN,  to  read  data  from  the  signature,  numeral,  or 
forgery  files  and  write  the  data  out  on  a  file  TEST.LST  is  provided  on  tape 
RADC  SIGNATURE  DATA  BASE.  This  program  and  an  example  of  the  programs  output 
is  given  below. 

As  shown  in  the  test  program  listing,  the  form  of  the  read  statement  for 
a  particular  record  (or  signature)  is 

READ(2)  SAMPID,  AUTHID,  NSAMPS,  ICORT,  IDORW 
(IDATE(I) ,1-1,5) , (ITIME(I) ,1-1,3) 

MNRESP , RMSDIF , ( OLDVAL ( J ) ,J-1,44) , 

( (JDATA(K,I) , 1-1 .NSAMPS) ,K-1 , 3) 

SAMPID  is  the  label  of  the  response  and  AUTHID  is  the  writer  identification. 
For  example,  if  the  record  contained  a  true  signature  of  subject  ABC  then 
SAMPID  -  ABC  and  AUTHID  -  ABC.  If  it  is  a  numeral  by  the  same  subject  then 
SAMPID  -  NUM  and  AUTHID  -  ABC.  For  the  forgery  files  SAMPID  contains  the 
initials  of  the  true  signer  and  AUTHID  the  initials  of  the  forger.  NSAMPS  is 
the  total  number  of  P,  X,  or  Y  samples.  NSAMPS  *  0.01  gives  the  length  in 
seconds  of  the  signature.  The  array  OLDVAL  contains  the  values  of  44  features 
discussed  in  III-A-2  and  DATA  contains  the  P,  X,  Y  data  (DATA  is  of  size 
3*NSAMPS) . 
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C  Kr..fru  TEST. FT* 

C - - - - - - 

C  This  proarM  r#«d s  •  data  fit*  (specified  h>  t»piaa  in  the 
C  xehjcot#  i»i  I  »•!*»  ««d  priutu  oat  the  data  i  a  specified  records  on 
C  file  TKST.LST.  TK»  pi  t|r»*  |>ruapU  iht  upnalci  i'.>r  Ihf  record 
C  inalrrt  (a  he  printed  out.  Each  record  interns  »||  (he  d«t»  fur 
C  •  p«rliv«l»r  Ira#  ti|iaUr«,  aaaeral.  .*r  *tleapt#d  f«r#er>.  To 
C  jet  •  horde  p|  of  lh#  o«  lp«  t  ,  spool  file  TKST.k*T  I-  th*  In# 

C  printer  <».#.  os#  PIP  «od  th#  /SP  switch  —  ■  PIP *TfNT . LNT/SP» 

dimension  oi.dvai.(44>.idate«si .itineo* .jimt«T 3*30001’' 

DIMENSION  NANDAT *9) 


loiliolii#  o  disk  fit#  !«•  rutii#  debase  inn  priatoet 
CALL  ASSIGN (6.  *ZZU:TEN*T.L5.T;|\  14) 


G«l  th#  ioitiois  of  th#  sebject 

type  s 

S  FORMAT*/;* 

ACCEPT  7, 

7  FORMAT*Al> 

TYPI  10 

10  FORMAT*/'*  Initials  uf  th#  trot  ii|itr  <3  characters)  '» 
ACCEPT  IS. ALT HON 
IS  FORMAT *  A3> 

IF* I CHECK  EO-  *F*  >  00  TO  22 

I F * ICMi.Ck  .EO.  *T  ’  I  ENCODE <10.20.  NANDAT »  Al THOR .  AUTHOR 
IF*  I CHECK  EQ  *N*i  ENCODE <10.31. NANDAT »  AUTHOR 

20  FORMAT *  * ZZO : T ' , A3 . 'X' . A3. '  DAT ; l ' ) 

21  FORMAT  I  *  Tjhd :  NNl’H  *  , '  X  *  .  A3 .  ’  .  DAT  ;  1  '  ) 

GO  TO  27 

22  TYPE  23 

23  FORMAT*/**  Initials  of  th#  Forpvr  <3  characters)  ’» 
ACCEPT  24.  FONG  ID 

24  FORMAT i A3» 

ENCODE . j  H . 25 . NANDAT '  AUTHOR . FONG 1 D 

25  FORMAT*  Z20  F  .A3. *X' . A3 , ' . DAT ;  I ' ) 

27  TYPE  2*.  NANDAT 

20  FORMAT*/*  Th#  fil#  opened  is  * .9A2» 

24  CALL  ASSIGN 1 2. NANDAT. I fo 


Tr a#  sipaatur#,  Naaeral ,  «-r  Forfar*  data* 

IfHf  t  of  * 1 
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THE  DATA  AND  PROGRAM  OUTPUT 
IS  ASSIGNED  TO  A  PSEUDO  DEVICE 
ZZO:.  BEFORE  EXECUTING  THE  PRO¬ 
GRAM.  THIS  PSEUDO  DEVICE  MUST 
BE  ASSIGNED  TO  THE  ACTUAL  DISK 
ON  WHICH  THE  DATA  IS  STORED. 
FOR  EXAMPLE.  IF  THE  DATA  IS 
STORED  ON  DR1 :  THEN  MCR>  ASN 
DR1 :  -  ZZO: 


0.1  lh.  Busbar  of  lh.  riipiui  lo  bo  pri.l.d  or  (.loll. 4 

29  REN  1 ND  2 
TYPE  3D 

30  FORMAT t / ’ •  Record  aeaber  •  *) 

ACCEPT  3 t . NLNREC 

31  FORMAT «I3i 

C  Skip  r#i  >rds  ap  to  tha  on#  specified 
IFthTMRFC  EO  I)  00  TO  37 
DO  3G  I »  l ,  SI  MREC-  I 

36  READ <2. END- 100) 

C - - - - - * _ 

C  Read  i*  lh#  data  frua  th#  specified  record 

37  IF* I CHECK  EO  'F')  GO  TO  39 

*EAD«2)  SaNPII),  ALTH1D. NSANPS.  ICORT.  JMNW 

♦  , 1 1 DATE < I ) ,  l-l.Sl . i IT  INK • •),  1-1,3), NNRKSF . RNSDIF 

•  _  .<OLDVAL«J).  J- 1 ,44) , * *  JDATA'k , I ) , l» I .NSANPS* , X# 1 ,3) 

GO  TO  se 

39  READ* 2 1  SAMP  ID, ALTH ID. NSANPS. IDED 

*  ,  <  I DATE*  I ) .  I-  1 ,5» .  <  ITIME*  It  .  I  •  1.3)  .NNRIXP,  RNSDIF 

*  .  sOI.ISVAl.sJi  .  J •  I  . 44 )  . «  i  JDAT A  *  k  .  I )  .  I  •  I  .  NSANPS •  . k •  l  . 3 ) 

l  .  i.t  il  iiliirs 

Su  hRUE*6.S?«  SAMP  I D .  All  H I D .  NSANPS  .  MNRhSl  .  RNSy  I F  .  (DATE,  11  INK 
fcRJTK '6.54» 

kH ITE  *6. 55*  1 1  .OLDVAL*  I )  .  1*1.44) 

WRIT!  -6.5*41 
DO  5 1  I ■ I . NSANPS 

WRITL'6 .59*  I.JDATA4  l .  I  >  » JDATA*  2 .  I )  .  JDATA (3 .  I ) 

51  COST  IMF 

52  FONNAT  * IHI ,5X. *  SANPID  •  ’A4.3X , ’ AETHID  •  * A4 . 3X .* NSANPS  #  *14 

♦  ,3a.1 NNRKSP  •  *  13. 3X, ‘RNSDIF  •  F6.2 

*  . // . SX , *  DATE  «  ’5A2.5X.  TINE  •  *5A2) 

54  FORMAT  «///23X,  *1*  ,7X.*F#atar«  Volot’/i 

55  F0NNATi20X. I4.6X.FI03) 

56  FORMAT < 1 H I . 23X . ' I ’ ,RX , 'X  volets * ,4« . *V  vela##* 

•  oSa.'P  tel ats  */> 

59  F0RNAT*2JX. 14, NX, 15, TR, 15, RX, IS) 


Repeat  if  dtsirtd 

Allow  th#  opliea  to  *nt  or  to  tpteif)  aaothor  record 
70  TYPE  7$ 

75  FORMAT*/**  Prut  oat  aaothor  record?  «Y  for  Y#s  or  N  for  Ha)1) 
ACVKPT  76.ICH 

76  FORNAT « Al  > 

IF*  ICH  .CO.  *R*I  GO  TO  9000 
IF  * ICM  EO  ' Y* »  00  TO  29 
00  TO  70 


TYPE  101 
FORNAT *//* 
00  TO  29 


••••  ERROR  —  Record  oaakor  o«t  of  roofo 


i  CONTINUE 
END 


FIGURE  A-1  TEST  PROGRAM  LISTING 


True  signature.  Numeral,  or  Forgery  data  (type  T.N.  or  F)  T 
Initials  of  the  true  signer  (3  characters)  CHS 
The  file  opened  is  3Z0  TCNSXCns . DAT, 1 
Record  number  (integer)  *  3 

Print  out  another  record?  (V  for  Ves  or  N  for  No) 

FIGURE  A-2  EXAMPLE  OF  TEST  PROGRAM  EXECUTION 


SAMP ID 
DATE  • 


•  CMS  Ainu  ID  •  CHS  NSAMPS  •  672  MNRESP  >  3  RNSD1F  •  1.73 

62-JUN-86  TIME  •  12:54: 


I  Faster*  Vale* 


1 

-2.369 

2 

26.963 

3 

17.886 

4 

-24.624 

5 

6.259 

6 

6.867 

7 

386.066 

• 

-16.619 

» 

263.606 

16 

48.606 

11 

-6.955 

12 

37.471 

13 

16.829 

14 

-29.073 

IS 

8.439 

16 

7.695 

17 

385.606 

16 

-16. Ill 

19 

264.006 

26 

50.066 

21 

24.462 

22 

37.629 

23 

21 .513 

24 

-24.462 

25 

8.638 

26 

7.  135 

27 

346.006 

28 

-8.836 

29 

363.006 

36 

42.606 

31 

41.916 

32 

45.902 

33 

4S.975 

34 

26.  195 

35 

23.784 

36 

-2.948 

37 

21.715 

38 

22. 118 

39 

48.923 

46 

6.656 

41 

5.000 

42 

6.586 

43 

6.060 

44 

6.676 

NOTE:  Program  TEST  writes  this  data 
out  to  file  TEST.LST;1  Instead  of 
directly  to  the  line  printer. 


I 

X  aetaae 

T  seises 

P  seises 

1 

-1 

1 

1 

2 

-1 

6 

3 

3 

-1 

3 

6 

4 

-1 

4 

11 

5 

-3 

9 

15 

6 

-S 

II 

21 

7 

-5 

13 

29 

8 

-8 

15 

35 

9 

-16 

18 

43 

16 

-11 

21 

49 

11 

-14 

24 

57 

12 

-14 

24 

65 

13 

-21 

13 

71 

14 

-36 

-9 

75 

IS 

-42 

-32 

61 

16 

-51 

-47 

66 

17 

-96 

-59 

89 

19 

-56 

-65 

96 

19 

-61 

-76 

91 

29 

-61 

-77 

92 

21 

-61 

-79 

95 

22 

-SI 

-62 

99 

23 

-46 

-77 

162 

24 

-21 

-64 

197 

25 

-S 

-46 

166 

26 

3 

-36 

164 

27 

II 

-18 

162 

28 

14 

-8 

96 

29 

16 

-3 

93 

36 

16 

S 

89 

31 

14 

12 

86 

FIGURE  A-3  EXAMPLE  OF  OUTPUT  OF  PROGRAM  TEST 
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393 

*18 

-81 

394 

*14 

-97 

395 

*12 

-111 

396 

*12 

-12S 

39? 

*15 

*133 

396 

*18 

*135 

399 

*25 

-132 

409 

*36 

-119 

401 

*53 

-98 

402 

*64 

-72 

403 

*60 

-49 

404 

*66 

-32 

4«S 

-60 

-20 

407 

-46 

-13 

400 

-36 

-19 

409 

-23 

-29 

410 

-8 

-33 

411 

2 

-35 

412 

It 

-36 

413 

17 

-34 

414 

23 

-31 

415 

28 

-29 

416 

32 

-23 

417 

36 

-19 

418 

38 

-8 

419 

36 

-1 

420 

33 

11 

421 

27 

20 

422 

25 

27 

423 

20 

33 

424 

17 

36 

425 

13 

36 

426 

9 

37 

427 

8 

32 

428 

8 

28 

429 

6 

20 

430 

3 

8 

431 

-5 

-13 

432 

-14 

-38 

433 

-21 

-56 

434 

-28 

-67 

435 

-33 

-72 

436 

-39 

-77 

437 

-43 

-84 

438 

-48 

-88 

439 

-52 

-91 

440 

-51 

-92 

441 

-47 

-97 

442 

-36 

-99 

443 

-26 

-99 

444 

-12 

-96 

445 

3 

-8? 

446 

13 

-75 

44? 

22 

-65 

448 

27 

-52 

449 

30 

-38 

450 

3) 

-22 

451 

28 

-7 

452 

22 

4 

453 

18 

13 

454 

n 

17 

455 

3 

19 

456 

-5 

16 

457 

-11 

4 

458 

-15 

-9 

459 

-20 

-23 

460 

-22 

-35 

461 

-22 

-43 

462 

-14 

-53 

463 

-6 

-58 

464 

3 

-58 

465 

16 

-54 

466 

22 

-47 

vmmmm 
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AN  ON-LINE  DATA  ENTRY  SYSTEM  FOR  HAND-PRINTED  CHARACTERS 
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An  On-Line 
Data  Entry  System 
for  Hand-Printed 
Characters* 

H.  D.  Crane 

Stanford  Research  Institute 
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Introduction 

The  primary  method  of  entering  large  amounts  of  rou¬ 
tinely  produced,  hand-printed  data  into  computer  systems 
is  via  manual  keyboards.  Manual  retranscription,  however, 
entails  a  number  of  disadvantages  such  as  extra  cost, 
delays,  and  errors. 

Optical  character  recognition  attempts  to  bypass  the 
manual  retranscription  process  by  providing  automatic 
reading  of  source'  documents.  However,  since  OCR 
processing  typically  is  separate  from  document  origina¬ 
tion.  the  generator  of  the  document  cannot  realise  the 
benefits  that  accrue  to  real-time,  on-line  automated  data 
entry.  Often  there  is  no  way  of  knowing  when  substitution 
errors  have  occurred,  and  OCR  equipment  is  costly  rela¬ 
tive  to  other  methods. 

Real-time  character  recognition,  i.e..  capturing  the 
material  as  it  is  written,  obviates  the  need  for  manual 
retranscription  or  OCR.  and  provides  for  immediate 
error  detection  and  correction.  However,  a  keyboard 
that  accommodates  a  large  character  set  —  plus  a  hard¬ 
copy  printer  for  each  data  entry  station  —  can  be  quite 
bulky  and  expensive. 

Alternatively,  a  direct  entry  system  may  use  an  inexpen¬ 
sive  writing  device  to  make  its  own  hard  copy  and  to 
produce  machine-recognizable  code.  Writing  systems  to 
track  pen  motions  have  been  previously  described,  but 
such  systems  require  special  writing  surfaces'  or  special 
writing  environments.'  Therefore  these  systems,  like  the 
keyboard  printer,  also  tend  to  be  bulky  and  expensive. 

This  paper  describes  a  system  that  uses  a  specially 
instrumented  bail-point  pen  requiring  no  special  writing 
surface.  Unlike  many  OCR  techniques,  the  method 
described  is  dynamic.  That  is.  instead  of  a  post  facto 
analysis  of  a  complete  input  pattern  —  e.g.,  in  terms  of 
loops,  comers,  and  height  —  the  character  recognition 
is  based  on  real-time  detection  and  analysis  of  the  sequence 
of  writing  directions  taken  by  the  pen.  Each  character  is 
described  in  terms  of  an  allowed  set  of  stroke  direction 
sequences.  The  character  actually  recognized  by  the  system 
can  be  echoed  to  be  verified  immediately  by  the  person 
generating  the  document. 


Monitoring  the  direction  of  motion 

The  writing  system  is  based  on  the  three-dimensional 
force  generated  at  the  pen  tip  during  writing.  This  force 
consists  of  the  downward  force  directed  toward  the  paper 
and  the  drag  force  in  the  plane  of  the  paper. 

From  force  measurements  alone  it  is  not  possible  to 
derive  an  accurate  measure  of  pen  velocity  (and  therefore 
of  pen  positionl.  because  drag  varies  with  paper  friction, 
the  exact  orientation  of  the  pen.  and  pen  pressure. 
Furthermore,  the  system  has  no  knowledge  of  pen  motion 
when  the  pen  is  lifted  from  the  paper.  However,  absolute 
pen  position  (although  necessary  for  entering  pictorial 
input  material  or  for  reconstructing  the  exact  form  of  each 
input  character  as  drawn)  is  not  necessary  for  character 
recognition.  It  is  sufficient,  as  we  show  subsequently,  to 
determine  the  sequence  of  direction  movements,  which  is 
readily  obtained  from  the  force  measurement. 

The  force-measuring  instrumentation  is  incorporated 
into  the  pen  tip  without  any  instrumentation  of  the  writing 
surface  or  the  writing  area.  The  vertical  force  on  the  paper 
indicates  when  the  pen  is  "down”  or  "up."  i.e..  on  or  off 
the  paper.  The  instantaneous  direction  of  pen  motion  is 
readily  determined  from  the  lateral  forces  in  the  plane  of 
the  paper. 


Three-dimensional  force-sensitive  pen 

A  previous  article'  on  a  direction-sensitive  pen  and  its 
potential  use  in  hand-printed  character  recognition  showed 
that  English  letters  can  be  described  by  a  sequence  of 
connected  up/ down'  and  left  right  movements.  The  pen 
used  a  pivoted  writing  shaft  that  moved  in  response  to 
the  writing  force  and  made  electrical  contact  with  one 
of  four  segments  of  a  commutator  ring.  Although  the 
device  showed  the  feasibility  of  such  an  instrument,  it 
was  crude  and  unreliable  because  it  required  mechanical 

*Thi>  system  was  conceived  and  developed  by  the  authors  at  Stanford 
Reaearch  Institute  Xtbec  Systems.  Inc..  Sente  Clara.  Californio,  ia 
developing  a  commercial  varaion  ol  the  system  under  licence  (ram  Ml. 
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motion  of  the  entire  writing  shaft  A  later  version  used 
a  light-emitting  diode  at  the  upper  end  of  the  pivoted 
writing  shaft;  the  light  was  directed  toward  a  stationary 
quadrant  photocell  The  poeition  of  the  shaft  waa  tracked 
by  monitoring  the  movement  of  the  light  with  respect 
to  the  photocell.  This  optical  system  provided  better  force 
sensitivity  than  the  earlier  version,  but  still  required  the 
entire  writing  shaft  to  move.  Also,  both  of  these  systems 
required  additional  measurements  to  determine  vertical 
pressure. 

In  the  most  recent  design,  the  pen  point  is  mounted  to 
a  diaphragm  containing  a  system  of  strain  gauges  to 
detect  the  instantaneous  lateral  and  vertical  forces  on  the 
point.  This  version  is  shown  in  Figure  1.  The  pen  point 
must  be  maintained  in  a  nominally  vertical  direction  during 
writing;  the  angle  of  the  barrel  can  be  adjusted  to  suit  the 
individual  user. 

Figure  2  shows  the  photographically  etched  strain- 
gauge  array,  which  is  bonded  to  the  diaphragm  inside 
the  housing  that  holds  the  replaceable  ink  cartridge. 
The  center  of  the  diaphragm  is  rigidly  connected  to 
the  pen  body,  as  shown  in  Figure  3a.  The  force  gen¬ 
erated  at  the  writing  tip  distorts  the  diaphragm,  as 
shown  exaggerated  in  Figures  3b  and  3c.  With  normal 
writing,  the  pen  point  deflects  less  than  a  thousandth  of 
an  inch. 

It  is  easier  to  describe  the  operation  of  the  strain- 
gauge  system  if  we  imagine  that  the  eight  gauges  are 
arranged  in  four  pairs,  as  shown  in  Figure  3a,  rather 
than  in  the  actual  planar  array  form  of  Figure  2.  These 
pairs  are  connected  electrically  in  a  compound  bridge 
circuit  (Figure  4)  that  isolates  the  three  components  of 
the  applied  force.  To  see  how  the  bridge  operates,  let  X 
and  Y  represent  the  left/right  and  near/far  directions  in 
the  plane  of  the  writing  surface,  and  P  the  vertically 
directed  force.  A  vertically  directed  force  will  cause  the 
diaphragm  to  bend  as  shown  in  Figure  3b.  The  four 
gauges  on  the  top  of  the  diaphragm  will  be  in  compression, 
and  the  four  gauges  on  the  bottom  of  the  diaphragm 
will  be  in  tension.  Hence,  the  voltages  at  Points  A  and  B 
in  the  upper  bridge  will  change  by  the  same  amount  and 
in  the  same  direction;  these  changes  will  cancel  in  the 
differential  amplifier  in  the  X  channel.  The  voltages  at 
Points  C  and  D  in  the  lower  bridge  will  also  change  by  the 
same  amount  and  in  the  same  direction,  so  there  will  be 
no  change  in  the  Y  output  either.  However,  the  polarity  of 
change  at  points  C  and  D  is  opposite  to  chat  at  Points  A 
and  B.  Accordingly,  the  changes  at  all  four  points  are 
additive  in  the  central  amplifier  which  measures  vertical 
pressure.  Thus,  vertical  force  is  monitored  by  the  central 
channel,  with  no  first-order  coupling  to  the  X  and  Y 
channels. 

A  lateral  force  in  the  X  direction  will  cause  the  diaphragm 
to  bend  as  shown  in  Figure  3c.  In  this  case,  Points  A  and 
B  will  move  in  equal  but  opposite  directions.  These 
changes  are  additive  in  the  output  of  the  X  channel  but 
cancel  in  the  P  channel.  Thus,  an  X-directed  force  will 
cause  a  change  only  in  the  X  channel.  Similarly,  a  Y- 
directed  force  will  cause  a  change  only  in  the  Y 
channel.  An  arbitrary  force  on  the  pen  point  can  thus 
be  resolved  into  X,  Y.  and  P  components. 

Note  in  Figures  3b  and  3c  that  the  polarity  of  strain  on 
the  lower  side  of  the  diaphragm  near  the  center  is 
the  same  as  the  polarity  of  strain  on  the  upper  side  of 
the  diaphragm  near  the  periphery.  It  is  for  this  reason 
that  the  four-pair  gauge  system  can  be  realized  in  the 
single-sided,  planar  array  shown  in  Figure  2. 

From  the  X  and  Y  components  of  force,  it  is  straight¬ 
forward  to  determine  the  instantaneous  angle  of  force 
in  the  plane  of  the  writing  surface  ii.e..  the  direction  of 
writing),  as  well  as  the  magnitude  of  the  force  in  that 


Figure  1 .  (a)  Ball-point  pen  that  measure*  the  three-dimen¬ 
sional  lores  generated  at  the  tip  during  writing; 
(6)  Replaceable  ink  cartridge  and  ballpoint-tip 
assembly. 


Figures.  Schematic  drawing  ot  the  photographically  etched 
array  ol  strain  gauges  thst  is  bonded  to  a  dia¬ 
phragm  inside  the  cartridge  housing. 
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Figure  4.  Compound  bridge  lor  isolating  tho  three-dimensional 
lore*  components. 
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Figure  3.  (a)  A  strain- gauge  arrangement  in  which  the  paired 
gauges  are  located  on  opposite  sides  ol  the  dia¬ 
phragm;  (b  and  c)  illustration  ol  the  effect  o(  a 
downward  and  lateral  force  (highly  magnified). 

direction.  The  system  also  provides  a  continuous  measure 
of  P.  the  vertical  force  (orthogonal  to  the  writing  surface). 
Although  the  pen  provides  high  resolution  force  measure¬ 
ments.  it  is  sufficient  for  hand-printed  data  entry  to  quan¬ 
tize  the  measurements  quite  coarsely.  In  the  vertical 
direction,  it  is  necessary  to  know  only  that  the  pen  point 
is  "up  or  "down."  i.e..  when  the  vertical  force  is  greater 
than  some  threshold.  The  X  and  Y  signals  are  quantized 
into  the  four  quadrant  directions:  up.  right,  down,  and 
left,  symbolized  by  U,  R.  0.  and  L.  "Pen-up,"  symbolized 
by  a  can  be  thought  of  as  a  fifth  direction  of  motion. 

The  following  section  shows  how  these  five  direction 
signals  (U.  R.  D.  L.  and  I  can  be  utilized  in  a  practical 
character-recognition  system.  In  this  system,  the  direction 
of  writing  is  sampled  at  a  clock  rate  of  approximately  30 
u>  100  per  second.  At  this  clock  rate,  each  new  direction 
signal  generally  persists  for  many  clock  cycles. 

Sequential  character  recognition  algorithm 

With  the  signals  provided  by  the  pen,  direction  of  writing 
is  the  only  information  available  for  character  recognition. 
At  etch  character  ia  printed,  the  pen  generates  a  sequence 


I  D  . 

3  -R  3  L  R  I 

3  '«  o  L  R  0  L  I 

‘O  R  D  l 

J  'D  B  O  l  R 

iD  R  U  l  - 

*"7  -r  o  i 

§  <L  D.  R  D  l  U  i 

0  tl  U  R  0  L  I 

IL  0.  R  U  L  i 

Figure  S.  An  Idealized  eel  of  numeric  characters.  The  symbols 
U.R.D.L..  represent  up,  right,  down,  left,  and  pen-up. 
respectively. 


of  direction  signals  dsscribing  its  motion.  With  a  reason¬ 
able  set  of  constraints  on  character  formation,  the  direction 
sequences  are  sufficient  for  machine  recognition  of  the 
printed  characters.  Figure  5  shows  a  typical  set  of  direc¬ 
tion  sequences  that  ia  unique  for  the  ten  digits.  For  example, 
the  sequence  for  a  1  is  (D..).  meaning  a  down  stroke 
followed  by  a  pen-up.  It  would  be  trivial  to  design  a 
logic  system  to  recognize  each  character  as  shown. 
However,  there  is  wide  variation  in  the  way  people  form 
cbnractfls.  It  is  advantageous,  therefore,  to  allow  as  broad 
a  range  of  aaquencaa  aa  possible  for  each  character. 

On#  possible  approach  to  the  sequence-recognition  prob¬ 
lem  is  the  "table  look-up,"  which  lists  all  allowed  sequen- 
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Figure  6.  A  portion  of  tfto  staff  macfiina  logic  f ssociaffd  with  ffte  ),  7, 9,  and  erase  characters. 


ces  for  each  character;  when  a  character  is  written,  the 
generated  sequence  is  compared  with  each  entry  in  the 
table.  Allowance  for  a  wide  variation  in  writing  styles  may 
require  an  excessively  long  table.  A  more  efficient  way 
utilizes  a  state  machine  with  state  transitions  determined 
by  the  direction  sequences.'  By  appropriately  specifying 
the  state  transitions,  particular  directions  or  direction 
sequences  may  be  ignored  if  they  are  not  relevant  to  the 
recognition  process. 

In  this  section,  we  consider  the  operation  of  such  a 
machine.  The  next  section  shows  how  the  state  machine 
can  be  efficiently  implemented  with  ROM  components. 

Figure  6  illustrates  the  portion  of  the  graph  of  the 
sequential  decision  machine  that  recognizes  the  digits  I, 
7.  and  9.  as  well  as  the  “erase"  character.  The  design 
demonstrates  the  range  of  possibilities  that  may  be  achiev¬ 
able.  The  figure  shows  broad  horizontal  lines,  which  repre¬ 
sent  the  various  states  of  the  machine,  and  vertical  LINK 
PATHS,  which  describe  the  state  transitions.  The  ovale 
at  the  bottom  indicate  output  characters  and  the  next 
state  following  the  output.  The  logic  structure  shown  it 
that  of  a  claas-4  state  machine,'  in  which  both  the  next 
state.  g(X.Q),  and  the  output.  ffX.Q),  are  determined  by 
the  peasant  stata.  X,  and  the  inputs,  Q.  The  highest 
state  of  the  machine,  marked  INIT  (initial),  becomes 
energized  whenever  a  character  haa  been  recognized,  and 


a  new  search  begins.  It  is  convenient  to  think  of  a 
marker  advancing  through  the  graph  as  each  different 
direction  is  recognized  in  sequence.  For  example,  an  initial 
left  stroke  would  move  the  marker  to  state  IL  (/nitial 
Left).  The  marker  would  remain  at  State  IL  for  ae  long 
as  the  sampled  direction  signal  remained  unchanged.  If 
the  writing  subsequently  turned  down.  D.  the  marker 
would  advance  to  state  LD  (Initial  Left  followed  by  Down). 
Because  the  state-transitions  depend  solely  on  the  direc¬ 
tion  sequences,  the  path  through  the  graph  is  indepen¬ 
dent  of  both  speed  of  writing  and  size  of  characters. 
Exceptions  are  the  front-end  and  back-end  timing  delays 
described  below. 

In  a  general  state  machine,  any  number  of  link  paths 
may  leave  a  state.  Each  state  of  the  pen  machine  has 
six  link  paths,  five  corresponding  to  the  five  directions 
(U,R,D,L,.).  end  a  sixth  (described  below)  resulting  from 
“timing  out"  (i.e.,  remaining  at  an  internal  state  with  pen 
up  for  e  certain  duration).  Thus  each  state  haa  six  possible 
successors.  If.  in  Figure  6,  a  particular  direction  is  not 
noted  as  a  link  path  from  a  state,  it  means  that  that 
direction  returns  the  marker  to  that  state.  For  example, 
state  DD  (Figure  6,  lower-left)  is  snterad  via  s  D  link 
path,  but  subsequent  D  signals  will  not  move  the  marker, 
nor  will  s  U  signal  Only  an  L,  R.  or  .  signal  following 
the  D  will  advenes  the  marker. 
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Let  us  now  consider  how  this  machine  implements  the 
specific  character  recognition  sequence  for  the  character  1. 
Although  l  is  nominally  described  as  a  single  downstroke 
followed  by  a  pen  lift,  the  logic  can  accommodate  a  wide 
variation  of  sequences  that  are  equivalent.  For  example, 
starting  from  the  INIT  state,  sequences  ID..).  (U.D..I. 
(R.D..).  and  lU.R.D..)  are  all  equivalent  in  moving  the 
marker  to  state  AA.  Furthermore,  the  (R.D..)  and  (U.R.D..) 
sequences  can  be  terminated  with  an  R  stroke  —  i.e.. 
(R.D.R..)  and  (U.R.D.R.i  —  without  affecting  the  final 
termination  of  the  marker  at  Node  AA.  The  ID..)  and 
lU.D.  )  strokes  can  be  terminated  with  a  U  as  well  as 
an  R  stroke  —  e.g..  (D.U..)  or  (U.D.U..).  Thus  all  these 
sequences  are  equivalent  to  the  ID..)  sequence.  The  accept¬ 
able  ways  to  make  the  basic  downstroke  of  the  character  1 
are  summarized  in  Figure  7a.  which  illustrates  the  ability 
of  the  logic  to  ignore  the  inevitable  glitches  produced  by 
human  writers. 

-  inuuui 

«  °i  3  a  '°i  3  a 

(c)  - -  t -  - ,  t - L 

Figure  7.  The  basic  variations  allowed  in  making  the  characters 
1. 9.  and  erase. 


Provision  for  this  range  of  spurious  initial  and  final 
signals,  however,  produces  a  conflict  with  the  7  —  basically 
an  (R.D..)  sequence  —  which  would  be  treated  as  a  1. 
To  avoid  this  conflict,  the  seven  is  completed  with  a  cross 
stroke  in  the  European  manner. 

This  illustration  of  the  crossing  of  a  character  introduces 
the  problem  of  character  segmentation.  How  does  the  sys¬ 
tem  know  whether  an  (R.D..)  sequence,  for  example,  is  to 
be  a  l.  or  whether  it  will  subsequently  be  crossed,  meaning 
a  7?  The  conflict  is  resolved  with  the  conditional  output 
logic  implemented  at  state  AA.  To  follow  this  conditional 
output  scheme,  note  that  any  of  these  cross-stroke 
sequences  —  iR..).  (U.R..),  (R.U..).  (U.R.U,.),  or  (U..)  — 
will  advance  the  marker  from  Node  AA  through  subsequent 
states  to  the  7  output  port.  Any  other  sequence  implies 
that  a  1  was  intended  and  that  the  subsequent  strokes 
were  the  beginning  strokes  of  a  new  character.  (Note  that 
the  cross-stroke  sequences  are  therefore  not  allowable  as 
the  beginning  strokes  of  a  character.)  Thus,  a  left 
stroke  should  signal  a  1  and  move  the  marker  back  to 
state  IL.  where  an  initial  left  movement  would  have  moved 
the  marker  from  an  INIT  start.  Of  course,  many  other 
direction  sequences  can  follow  a  1.  These  all  signal  a  1 
and  move  the  marker  to  the  appropriate  internal  state. 
Consider,  for  example,  a  1  following  a  1.  The  first  1  will 
energize  state  AA.  The  subsequent  (D..)  sequence  (or  any 
of  its  equivalences)  will  energize  output  port  ll.ID).  thus 
signaling  a  1  I  the  first  1)  and  return  the  marker  directly 
to  state  AA  via  state  ID.  In  other  words,  a  sequence  of 
l's  will  continually  cycle  through  the  output  port  (l.ID) 
and  then  back  to  state  AA  through  state  ID. 

Also  shown  in  Figure  6  is  a  portion  of  the  logic 
associated  with  the  detection  of  digit  9  and  the  erase 


character.  The  former  is  described  nominally  as  an  1LU.R.D..1 
sequence,  although  variations  are  also  permitted.  In  parti¬ 
cular.  the  sequence  can  begin  with  a  down  stroke,  i.e.. 
(D.  L.  I.  and  can  end  with  a  left  stroke  or  even  with  a 
left,  up  stroke.  Accepted  variations  are  shown  in  Figure  7b. 
The  erase  character  is  basically  a  left  stroke  with  the 
allowed  variations  shown  in  Figure  7c 
We  have  noted  that,  from  any  state,  an  arbitrary  output 
code  can  be  signaled  and  the  marker  advanced  to  any  other 
state.  Let  us  note  one  other  special  capability:  timing  out. 
A  timing  function  is  provided  that  measures  the  elapsed 
time  since  the  last  pen-up.  If  the  elapsed  time  before  the 
next  pen-down  is  greater  than  some  specified  magnitude 
le.g..  500  msec),  the  marker  will  automatically  return  to 
the  INIT  state,  and  an  arbitrary  output  can  be  signaled 
This  is  handled  by  treating  the  timing-outs  as  a  sixth  link 
path  from  each  state  Without  this  special  timing  action 
a  1.  for  example,  as  the  last  character  in  a  suing  would 
cause  the  marker  simply  to  advance  to  and  remain  at  state 
AA.  With  timing-out.  a  1  output  is  automatically  produced 
and  the  marker  returned  to  the  INIT  state 


ROM  implementation 

The  recognition  logic  can  thus  be  thought  of  as  a  state 
machine  with  five  direction  inputs  1U.R.D.L..1.  a  timeout 
input,  and  a  set  of  output  codes  leg  .  ASCII  code  words) 
A  particularly  straightforward  synthesis  can  be  achieved 
with  ROM  logic.  The  use  of  programmable  ROMs  is 
especially  useful  during  the  iteration  01  link-path  struc¬ 
tures.  because  changes,  can  easily  be  made  in  the  ROM 
content  rather  than  in  the  hardware. 

Each  state  of  the  machine  is  assigned  a  block  of 
addresses  that  contains  ail  the  link-path  connections  to 
subsequent  states  plus  the  timing-out  and  conditional 
output  operations.  Because  it  is  possible  to  energize 
conditionally  an  output  port  as  well  as  advance  to  another 
state,  each  location  can  contain  either  an  output  code  or 
a  new  state  address  (indicated  by  the  most  significant 
bit  of  the  word). 

An  efficient  synthesis  of  the  system  can  be  achieved 
with  8K  (1024-word  x  8-bitl  IC  chips.  The  most  significant 
bit  (MSB)  of  each  address  is  reserved  as  a  Hag  to  indicate 
whether  the  subsequent  7  bits  are  to  be  treated  as  a  state 
address  (MSB  =  0)  or  as  a  7-bit  output  code  iMSB  —  It. 
The  remaining  7  bits  allow  up  to  2*.  or  128.  state  addresses. 
Each  state,  in  turn,  has  8  link  paths:  the  five  directions 
(U.R.D.L..)  plus  three  others  discussed  below  Thus,  each 
state  occupies  eight  addresses,  and  each  1024-word.  8-bit 
ROM  can  therefore  implement  128  states,  exactly  the 
number  addressable  by  the  7  bits.  The  numerics-only 
machine  from  which  Figure  6  is  abstracted  contains 
approximately  7$  states. 

As  shown  in  Figure  8,  the  7  most  significant  bits  of 
the  address  of  any  particular  location  in  a  ROM  are 
specified  by  a  state-address  register  iSAR).  which  specifies 
a  block  of  eight  sequential  addresses.  A  3-bit  link-path 
address  register  (LAR)  determines  which  of  the  eight 
cells  within  the  state  block  is  selected.  The  ten  bits 
together  specify  one  of  the  1024  words  of  the  ROM.  the 
output  of  which  contains  either  the  next  state  or  an  output 
code.  The  LAR  is  set  to  0.  1.  2.  3.  or  4  according  to 
whether  the  current  pen  direction  is  pen-up.  up.  right, 
down,  or  left  respectively.  It  is  set  to  5  if  the  pen  has 
timed  out.  If  the  ROM  word  currently  addressed  contains 
all  zeros,  the  LAR  is  set  to  6  on  the  next  dock 
pulse.  This  is  used  to  implement  a  conditional  output  when 
the  next  state  is  not  INIT.  as  described  below. 
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Figure*. 

ROM  addressing  scheme. 
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To  follow  the  ROM  synthesis  in  more  detail,  consider 
the  hypothetical  machine  shown  in  Figure  9a.  If  we  have 
reached  state  HH  (i.e..  SAR  contains  the  address  HH) 
and  the  pen  turns  downward,  the  very  first  clock  cycle 
that  recognizes  the  D  direction  will  set  the  LAR  to  3. 
At  that  indexed  location  is  the  address  of  state  JJ,  as 
shown  in  Figure  8b.  which  will  be  clocked  into  the  state 
address  register.  The  LAR.  however,  will  not  change  as 
long  as  the  pen  continues  to  move  downward.  During  that 
time,  each  clock  cycle  will  address  word  3  of  state  JJ, 
which  contains  the  address  of  state  JJ.  That  is.  state  JJ  is 
entered  via  a  D  signal,  and  the  marker  will  remain  at  JJ 
for  as  long  as  the  D  signal  persists.  (If  the  address  at 
word  3  of  state  JJ  were  other  than  JJ,  a  sustained  D 
direction  would  have  caused  the  marker  to  move  away  from 
state  JJ  on  the  next  clock  cycle  after  entering  state  JJ.) 

If  the  pen  is  subsequently  moved  to  the  right,  the  LAR 
will  be  set  to  2.  and  the  address  for  state  XX  will  be 
fetched.  If  the  pen  is  lifted.  the  node  address  will 
remain  unchanged  (LAR  index  0  also  contains  address  JJ), 
but  if  the  pen  remains  up  for  longer  than  the  specified 
interval,  the  LAR  will  be  set  to  5.  where  the  code  for 
output  character  J  is  found,  and  the  state  address  register 
will  be  reset  to  the  address  of  the  INIT  state. 

All  time-outs  and  most  normal  outputs  will  produce  a 
transition  to  the  INIT  state.  The  address  of  INIT  is 
chosen  to  be  SAR  *  0.  so  that  this  state  transition  can 
be  produced  simply  by  clearing  the  SAR.  Owing  to  this 
choice,  the  address  of  INIT  does  not  have  to  be  stored 
in  the  ROM. 

For  conditional  outputs  which  do  not  return  to  INIT 
(called  dual  mode),  it  is  necessary  to  store  both  the  output 
code  and  the  next  state  address.  The  implementation  of 
this  feature  uses  LAR  index  cells  6  and  7. 

A  dual-mode  output  is  indicated  when  the  contents  of 
the  selected  ROM  word  are  all  zeros.  For  example,  at 
state  JJ  the  contents  of  LAR  1  and  4.  i.e..  U  and  L.  are 
zero.  If  the  pen  moves  in  either  of  those  directions,  the 
all-zero  ROM  word  will  cause  the  LAR  to  be  set  to  6  for 
one  cycle  and  then  to  7  for  the  next  cycle.  During  the 
first  cycle,  the  code  for  character  7  will  be  outputted 
ibecause  the  MSB  of  that  word  is  1);  during  the  subse¬ 
quent  cycle,  the  address  for  state  LL  will  be  fetched. 
At  state  LL.  a  left  movement  will  continue  directly  to 
state  MM.  and  a  U  movement  will  continue  directly 
through  to  stace  NN.  That  is.  starting  from  state  JJ. 
a  U  movement  will  move  the  marker  to  state  LL  and  then, 
during  the  very  next  cycle,  to  state  NN.  Inserting  the 
extra  state.  LL.  avoids  the  more  complicated  conditional 
structure  that  would  be  necessary  if  we  had  to  program 
the  L  transition  from  state  JJ  to  one  state  and  the  U 
transition  to  a  different  state. 

Other  functions  could  be  added  to  each  state.  For 
example,  movements  could  be  quantized  into  more  than 
four  directions,  or  different  LAR  locations  could  be 
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Figure  9.  ROM  realization  o<  the  state  machine  structure. 


addressed  if  the  movement  in  a  certain  direction  were 
greater  or  less  than  some  specified  duration.  These  added 
functions  would,  of  course,  require  larger  blocks  of  address¬ 
es  for  each  state  location. 

Although  many  tradeoffs  are  possible  —  greater  freedom 
can  be  allowed  in  one  character  at  the  expense  of  others  — 
a  state  machine,  whether  realized  in  ROMs.  PLA's,  or  in 
a  microprocessor,  is  efficient  in  handling  a  wide  range  of 
variations  without  having  to  list  or  to  account  specifically 
for  every  allowed  sequence,  or  even  every  element  of  each 
sequence.  This  is  in  contrast  to  a  table  look-up,  which 
requires  s  complete  listing  of  all  allowed  sequences.  The 
design  of  recognition  sequences  in  eithsr  case,  however,  is 
still  largely  ad  hoc.  and  the  partial  structure  illustrated 
in  Figure  6  has  evolved  through  many  interactions  to 
improve  performance. 
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Perform* net  characteristics 

A  major  source  of  error  for  the  novice  is  letting  the 
pen  point  rest  on  the  writing  surface  in  a  static  position 
at  the  beginning  or  end  of  a  character.  A  spurious  pat¬ 
tern  of  direction  signals  is  produced  because  there  ia 
strain  at  the  pen  tip  even  though  there  is  no  motion.  To 
avoid  these  spurious  signals,  the  user  must  learn  to  move 
the  pen  in  the  desired  initial  direction  before  (or  just  as) 
the  pen  touches  the  paper,  and  to  continue  moving  in  the 
final  direction  as  the  pen  leaves  the  paper.  Of  course,  many 
spurious  patterns  can  be  tolerated  (logical  “don't  cares”), 
as  has  been  shown  in  connection  with  Figure  6.  Direction 
signals  are  ignored  during  an  initial  dead  time  beginning 
when  the  pressure  threshold  is  first  exceeded.  This  dead 
time  (typically  in  the  range  of  50-100  msec)  helps  to  ensure 
thac  the  pen  is  moving  in  the  desired  initial  direction 
before  sampling  actually  begins. 

By  delaying  the  use  of  direction-change  information, 
it  is  possible  to  ignore  any  final  directions)  that  are 
shorter  than  a  minimum  duration.  This  form  of  back-end 
timing  minimizes  the  effect  of  spurious  tails  at  the  end  of 
strokes. 

Another  initial  difficulty  is  learning  to  hold  the  pen 
vertically.  Any  tilt  biases  the  force  pattern  in  the  direction 
of  the  tilt.  With  strong  tilt  to  the  left,  for  instance, 
the  direction  encoder  could  continue  to  signal  "left."  even 
though  the  pen  were  actually  moving  in  another  direction. 

We  have  developed  several  effective  r  'ds  to  learning. 
Four  direction  lights  continually  signal  the  instantaneous 
direction  of  writing,  as  determined  by  the  signal  processor. 
Also,  each  character  can  be  displayed  on  an  accumulating 
alphanumeric  visual  display  and/or  repeated  audibly  by 
loudspeaker  or  earphone  as  it  is  recognized. 

Most  users  quickly  adapt  to  the  smooth  movements 
required  as  the  pen  touches  and  leaves  the  paper  and  to 
the  need  to  hold  the  pen  reasonably  vertical,  and  error 
rates  typically  drop  to  a  few  percent  within  an  hour  or 
so  of  practice.  After  this  learning  period,  surprisingly 
variable  writing  can  be  tolerated,  as  illustrated  in  Figure  10, 
which  shows  an  array  of  characters  written  at  one  sitting 
by  a  single  user,  in  which  every  character  was  correctly 
recognized.  Note  in  particular  that  the  system  is  inherently 
independent  of  character  size  and  quite  tolerant  of  sloppy 
printing. 

Practical  systems  can  be  designed  around  this  pen  for 
relatively  small  character  sets.  e.g..  the  ten  digits  plus  a 
few  special  characters  such  as  erase  and  space.  A  state 
logic  system  for  more  than  40  alphanumeric  characters 
has  also  been  designed.  Users  experienced  with  the 
numerics-only  set  can  perform  reasonably  well  with  this 
larger  set.  However,  it  is  not  yet  clear  whether  a  practical 
system  with  this  many  characters  could  be  designed  for 
a  broad  range  of  users. 


Format  control 

Thus  far  we  have  considered  only  the  problem  of 
recognizing  isolated  characters  as  they  are  produced  by 
the  pen.  However,  as  noted  earlier,  the  system  has  no 
measure  of  absolute  pen  position  in  space.  In  using  the 
pen  to  fill  out  a  form,  it  is  necessary  to  specify  the  box 
or  area  being  filled  out  at  any  moment. 

For  use  with  common  forms  rather  than  free- format 
entry,  the  pen  itself  can  enter  format  information.  By 
letter  code,  the  pen  can  specify  to  the  system  what  fields 
of  date  are  to  be  entered  and  in  what  order,  how  large 
each  field  is.  and  whether  the  field  is  numeric  or  alpha¬ 
numeric  For  final  verification  of  data  before  entry  to  the 
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Figure  10.  Random  characters  recognized  without  error.  Note 
that  the  system  is  inherently  independent  ol  chersc- 
ter  size. 


computer  system,  the  entered  data  can  be  displayed  on  a 
screen  in  the  format  of  the  form  being  filled  out. 

The  pen.  in  other  words,  can  be  used  in  two  modes 
within  an  integrated  system.  In  the  format  entry  mode,  a 
computer  processor  is  programmed  by  the  pen  to  accept 
certain  kinds  of  information  in  a  particular  format,  as  an 
intelligent  terminal  might  be  programmed.  In  the  data 
entry  mode,  the  system  accepts  the  detected  characters 
as  data. 

Discussion 

The  pen  described  in  this  paper  permits  a  system 
design  requiring  no  special  writing  surface  or  special 
writing  environment. 

In  contrast  to  OCR  schemes,  which  suffer  from  paper- 
related  problems  such  as  dirt  smudges,  breaks  in  the  ink 
pattern,  and  folds  in  the  paper,  the  scheme  described  here 
uses  information  derived  from  the  pen  itself,  not  from 
the  writing  on  the  paper.  The  final  image  is  irrelevant 
to  the  character  recognition  process,  and  the  paper  can 
immediately  be  reduced  to  archival  status.  Because  of  the 
simplicity  of  the  recognition  logic  and  the  elimination 
of  special  paper-handling  requirements,  the  total  system 
can  be  small  and  portable. 

Static  and  dynamic  methods  of  character  recognition 
might  be  usefully  complementary  for  very  large  character 
sets  -  e.g.,  Chinese  script  —  that  neither  technique 
alone  could  handle.  Characters  having  similar  dynamic 
patterns  but  distinctly  different  static  forms  can  be  sep¬ 
arated  by  static  methods.  For  example,  the  letters  P  and 
D  drawn  as  ID...R.D.L..)  sequences  are  indistinguishable 
by  the  dynamic  method  discussed  hare,  although  statically 
they  are  easily  distinguishable.  Similarly,  characters  with 
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stroke  configurations,  L#..  similar  static  forma, 
in  which  ths  stroke*  art  mads  in  different  sequence*.  can 
bs  distinguished  by  dynamic  methods.  In  other  words, 
certain  dynamic  information  captured  as  the  material  is 
written  may  be  useful  during  subsequent  processing  even 
if  not  adequate  alone  for  real-time  processing. 

The  system  delivers, ASCI  l  code  words  as  output  and 
is  compatible,  therefore,  with  computer  teletype  ports. 
The  strain-gauge  transducers  need  to  be  sampled  only 
about  30  times  per  second,  and  only  changes  in  direction 
need  be  transmitted  to  the  logic  processor.  Thus,  only  a 
small  amount  of  preprocessor  circuitry  need  be  connected 
with  each  pen.  and  the  direction  information  can  be  trans¬ 
mitted  with  low  bandwidth  to  e  central  processor.  ■ 
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THE  SRI  PEN  SYSTEM  FOR  AUTOMATIC  SIGNATURE  VERIFICATION 


THE  SRI  PEN  SYSTEM  FOR  AUTOMATIC  SIGNATURE  VERIFICATION 


H«wltt  D.  Crane,  Daniel  E.  Wolf  and  John  S.  Ostrea 
Stanford  Research  Institute,  Menlo  Park,  California  94025 


I.  INTRODUCTION 

A  need  has  been  growing  In  recent  years  for  a  prac¬ 
tical,  automatic,  personal  Identification  system  In 
both  government  and  private  business.  Applications 
range  from  government  high  security,  such  a a  control¬ 
ling  access  to  sensitive  sreas,  to  protection  of  access 
to  computer  facilities  and  data  banks.  Most  of  the 
mechods  of  personal  Identification  so  far  developed 
have  been  based  on  fingerprints,  voice,  personal  iden¬ 
tification  numbers  (PIN),  physlcel  features  such  as 
hand  geometry,  and  naturally,  the  handwritten  signature 
Signature  verification  is  one  of  the  most  promising 
techniques,  considering  psychological  acceptance,  tech¬ 
nical  feasibility,  and  cost. 

By  "signature  verification”  we  mean  the  following: 
The  person  whose  Identity  la  to  be  verified  gives  a 
name  or  ID  nuifcer  and  writes  a  signature,  which  will  be 
referred  to  as  the  test  signature.  The  test  signature 
is  then  compared  with  a  computer-stored  representation, 
called  the  template,  of  the  signature  corresponding  to 
the  given  name  or  ID  number.  If  the  test  signature  Is 
"close"  enough  to  the  template  by  some  appropriate  mea¬ 
sure,  the  person's  Identity  la  verified;  if  not,  he  Is 
Judged  an  Imposter, 


III.  SIGNATURE  VERIFICATION:  PARAMETERS  METHOD 

The  signature  verification  process  Is  based  on  a 
template  matching  procedure  In  which  the  P,  X,  and  Y 
force  signals  generated  during  the  writing  of  the  test 
signature  are  coshered  against  the  P,  X,  and  Y  force 
signals  of  the  appropriate  te^>lste  stored  In  a  comput¬ 
er.  The  comparison  can  be  made  In  many  ways.  But  In 
general,  a  numerical  measure  of  the  "closeness"  of  the 
test  signature  to  the  template  Is  computed  and  compared 
against  a  preset  value,  which  we  call  the  decision 
threshold.  If  the  numerical  measure  of  "closeness"  la 
leas  than  or  equal  to  the  decision  threshold,  then  the 
test  signature  Is  Judged  to  be  a  true  signature.  If 
the  test  signature  Is  greater  than  the  decision  thresh¬ 
old,  It  Is  Judged  a  forgery.  A  parameters  (or  feattces) 
technique  computes  as  the  numerlcel  measure  of  close¬ 
ness  a  normalized  vector  difference  between  a  set  of 
feature  values  extracted  from  the  teat  signature  and 
the  corresponding  feature  values  of  the  appropriate 
template.  This  technique  Is  computationally  efficient, 
requires  only  a  small  amount  of  template  storage  for 
each  system  user,  and  can  be  implemented  In  a  stand¬ 
alone  microprocessor  unit.  Other,  more  sophisticated 
verification  techniques  are  discussed  briefly  In 
Section  VI. 


Automatic  signature  verification  requires  a  repre¬ 
sentation  of  the  written  signature  In  a  form  suitable 
for  computer  input  and  subsequent  data  processing. 

There  are  basically  two  ways  to  obtain  such  a  signature 
representetlon.  One  is  to  scan  the  signature  optically 
after  It  has  been  written;  this  technique  Is  similar  In 
principle  to  that  used  for  optical  character  recogni¬ 
tion.  However,  optical  scanning  devices  usually  are 
bulky,  expensive,  and  generally  unsuited  for  real-time 
applications  of  signature  verification.  A  more  attrac¬ 
tive  and  useful  approach  Is  to  have  either  the  writing 
device  or  the  writing  surface  generate  signals  repre¬ 
sentative  of  the  signature  while  It  la  being  written. 

In  this  paper,  we  describe  an  autistic,  real-time 
signature  verification  system  that  has  been  deve’oped 
at  Stanford  Research  Institute  (SRI).  We  present 
Type  I  (true-slgner  rejection)  and  Type  II  (forger  ac¬ 
ceptance)  error  rates  as  determined  from  tests  on  a 
first  data  base  of  true  signatures  and  attempted  forg¬ 
eries.  In  the  discussion  In  Section  VI  we  state  why  we 
believe  the  results  presented  are  conservative  and  will 
be  Improved  In  the  future. 


In  the  parameters  technique,  a  niaaber  of  parameter 
values  (features)  are  extracted  from  the  three  continu¬ 
ous  force-signals  generated  by  the  pen  during  the  writ¬ 
ing.  These  features  Include  the  total  time  of  the 
signature,  the  time  the  pen  Is  on  the  paper,  the  time 
the  pen  is  off  the  paper,  the  average  force  In  each  of 
the  three  dimensions,  the  averagp  energies,  the  average 
angle  of  writing,  and  many  others.  It  it  likely  that 
not  all  of  the  extracted  features  will  be  equally  ef¬ 
fective  for  discriminating  between  true  signatures  and 
attempted  forgeries.  Also,  It  Is  desirable  to  reduce 
the  number  of  features  to  save  computation  time  and 
template  storage  space.  For  these  reasons,  a  feature 
selection  technique  Is  used  to  select  those  features 
most  effective  (resulting  In  the  least  probability  of 
error)  In  discriminating  between  true  signatures  and 
forgeries.  Thus  far  In  our  analysis,  we  have  examined 
more  than  fifty  such  features.  By  application  of  a 
standard  F-ratlo  method  of  analysis  (see  reference  2), 
typically  we  reduce  the  number  of  feetures  to  between 
10  and  20.  Either  a  uniform  aet  of  features  Is  used 
for  all  subjects  or  sets  that  are  personalized  for  each 
subject  are  used. 


II.  SRI  THREE-AXIS  PEN 


The  SRI  signature  verification  system  uses  a  strain- 
gauge-  Instrumented  ballpoint  pan,  shown  In  Figure  1, 
that  was  developed  by  Crane  at  SRI.  A  small  array  of 
strain-gauges  near  the  ballpoint  tip  generates  three 
electrical  signals  that  are  representative  of  the  in¬ 
stantaneous  three-dimensional  drag  force  at  the  writing 
tip.  Specifically,  three  Independent  orthogonal  com¬ 
ponents  of  the  total  drag  force  are  measured:  downward 
force  perpendicular  to  the  plane  of  the  writing  surfaim 
(henceforth  called  pressure,  or  P),  far/near  force  In 
the  plane  of  the  writing  surfeee  (called  Y),  and  left/ 
right  force  In  the  plane  of  the  writing  surface  (called 
X).  Each  of  the  three  force  signals  has  a  high  slgnal- 
to-nolse  ratio.  The  pan  has  an  ordinary  writing  tip 
and  It  requires  no  special  writing  surface. 


Given  a  set  of  features,  the  dec la lon-maklng  algo¬ 
rithm  used  for  deciding  if  a  particular  test  signature 
Is  a  true  signature  or  a  forgery  Is  as  follows:  When 
s  test  signature  Is  written,  a  value  for  each  of  the 
featuras  Is  extracted  from  the  P,  X,  and  Y  signals. 

The  test  signature  may  thus  be  represented  by  a  feature 
vector  t,  defined  as 

rsi 


(1) 


L*fJ 


where  f  is  the  number  of  features  extracted  from  the 
test  signature  and  s.  Is  the  value  of  the  1th  feature. 
To  determine  If  the  test  signature  Is  a  true  signature 
or  a  forgery,  the  feature  vector,  t,  must  be  cospared 
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with  the  appropriate  taaplata  vector,  t .  A  t cap late 
vector  t  auat  be  obtained  for  each  system  user.  This 
requirement  neceaaltatea  a  alaple  earollaaat  procedure 
In  which  each  uaar  eigne  aeveral  of  hla  true  el gnat urea . 
The  taaplata  vector  for  each  uaer  la  conatructed  by 
averaging  the  N  elgnaturea  obtained  during  the  enroll¬ 
ment  procedure.  Thua,  for  a  particular  uaer 


(2) 


taaplata  apaca  la  required  (alnce  an  entire  covariance 
nat r lx  auat  be  atored  for  each  uaer);  and.  Boat  impor¬ 
tant  ,  a  large  number  of  true  elgnaturea  (typically, 
aeveral  tines  f  signatures)  la  required  to  obtain  a 
statistically  confident  estimate  of  each  user  covari¬ 
ance  aatrlx,  thua  leading  to  a  much -ex tended  enrol laent 
procedure.  For  theee  reasons,  we  eaploy  a  slapler  fora 
of  distance  metric,  obtained  by  assuming  that  the  fea¬ 
tures  are  mutually  statistically  Independent.  In  this 
case,  all  the  off-diagonal  alaaents  of  the  covariance 
aatrlx  are  zero  (l.e.,  a2  -  0  for  1  t  J),  and  Equation 
6  reduces  to  1 


where 


(3) 


Is  the  average  value  of  the  1th  feature  and  tj,  la  the 
value  of  the  1th  feature  for  the  true  signature  ob¬ 
tained  during  an  enrollment  procedure  in  which  N  signa¬ 
tures  are  taken. 


A  template  covariance  aatrlx,  C,  can  be  coaputed  as 


C 
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(4) 


°ik  ’  N-l^  "  tk)  (5) 


are  the  unbiased  estimators  of  the  elements  of  the  co- 
variance  matrix,  C.  The  diagonal  elements  are  the  vari¬ 
ance  of  the  respective  parameters;  0Z  1*  the  variance 
.  rr  11 

of  the  lcri  feature,  and  *  /o^  la  the  correspond¬ 
ing  standard  deviation. 

Under  the  explicit  assumption  that  the  set  of  f 
features  Is  distributed  jointly  as  a  multivariate 
Gaussian  density,  It  can  be  shown  that  an  optlaua  rule 
for  classifying  a  test  signature  as  true  or  aa  a  forg- 
ergy  is  the  following. 

Cumpute  the  distance  metric 

d(s)  -  Z^_U-t)+  C* 1  (s-t )  (6) 

(where  t  Indicates  the  transpose  operation  and  C-*  la 
the  Inverse  of  C) ,  and  declare  that  the  test  signature 
Is  true  if  d (t)  Is  equal  to  or  less  than  the  decision 
t, :eshold  and  that  otherwise.  It  Is  a  forgery. 

Unfortunately,  the  distance  aetrlc  of  Equation  i 
has  several  disadvantages  for  application  In  a  practi¬ 
cal  system  if  f  la  large.  The  aatrlx  Inversion  of  C 
may  be  quite  time-consuming;  considerable  space  for 


where  for  convenience  we  have  compressed  the  notation 
by  setting  -  o,. 

The  distance  metric  of  Equation  7  la  slaple,  fast 
to  co^iute  and  requires  only  S  to  10  true  signatures 
for  the  user  to  be  enrolled.  However,  some  loss  of 
performance  la  expected  if  the  set  of  features  has  sig¬ 
nificant  linear  correlation. 

In  fact,  it  Is  probable  that  the  signature  verifi¬ 
cation  features  are  not  Jointly  distributed  aa  a  multi¬ 
variate  gausalan  denalty.  In  thla  case,  neither  of  the 
previously  shown  distance  metrlca  are  known  to  be  opti¬ 
mum,  and  It  la  not  clear  chat  the  distance  metric  glvmi 
by  Equation  7  will  yield  worse  performance  than  the 
more  coaplex  distance  aetrlc  given  by  Equation  6,  even 
If  the  features  are  significantly  linearly  correlated. 
He  therefore  uae  the  distance  aetrlc  of  Equation  7  be¬ 
cause,  even  though  It  probably  la  not  optimum.  It  la 
still  a  reasonable  claaalflcatlon  algorithm  that  has 
yielded  good  performance  In  prior  studies,  and  has  all 
the  advantages  previously  mentioned  for  application  to 
a  practical  signature  verification  system. 

Using  the  distance  metric  of  Equation  7  requires 
that  two  numbers  (an  average  value  and  a  standard  de¬ 
viation)  be  stored  for  each  feature  of  a  subject's  tear- 
plate.  Basing  the  analysis  on,  say,  10  features  there¬ 
fore  requires  storing  20  numbers  per  subject  (approxi¬ 
mately  200  bits) .  By  selecting  a  set  of  features  for 
each  subject,  It  may  be  possible  to  use  only  5  to  10 
features  per  subject,  both  reducing  storage  require¬ 
ments  and  Improving  performance. 

IV.  DATA  BASE 


A  data  base  of  true  signatures  and  attempted  forg¬ 
eries  has  been  obtained  for  the  purpose  of  estimating 
the  Type  I  and  Type  II  error  rates  for  this  system. 
Sixteen  persons  selected  randomly  from  a  larger  group 
of  volunteers  were  subjects  for  the  data  base.  In¬ 
cluded  were  secretaries,  research  aaalstants  and  engi¬ 
neers.  Each  subject  was  given  a  set  of  written  Instru¬ 
ctions  describing  the  procedure  for  the  slgn-ln  sessions 
and  vas  scheduled  to  appear  for  between  one  and  three 
sign-  In  sessions  per  week  over  a  period  of  three  months 
for  a  total  of  16  slgn-ln  sessions.  At  each  session, 
the  subject  signed  hla  or  her  own  signature  three  times 
and  attempted  two  forgeries  of  one  of  the  other  data 
bass  members.  For  the  forgery  attempts,  each  subject 
was  given  several  copies  of  the  signatures  of  all  the 
other  subjects,  a  written  form  that  stated  that  the 
signature  verification  system  based  the  true (forgery 
decision  on  matching  the  forces  and  motions  Involved  In 
writing  a  signature,  and  was  encouraged  to  practice 
prior  to  the  formal  forgery  atteapts.  The  subjects 
were  not  given  any  faedback  either  on  whether  their 
true  signatures  were  verified  or  whether  their  forgery 
attempts  were  accepted  or  rejected. 
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Th#  f,  X,  and  V  fore*  ilpuli  for  oaeh  of  tho  truo 
signature*  tad  ait«M«d  forgone*  mn  scored  oa  nag- 
a* tic  tap*.  At  tho  conclusion  of  tha  a pacified  tins 
period,  about  100  trut  signatures  aod  *25  attempted 
forgeries  had  been  collected. 

v.  mam 

Tha  data  were  ana lysod  la  aavaral  wap*. 

The  >00  true  signatures  ware  divided  randomly  Into 
two  group*  of  *00.  Tho  ooloctloa  of  feature*  for  each 
subject  waa  aada  ualng  aa  P-ratio  analysis  to  choooa 
thoae  feature*  that  aero  aoot  effective  la  dlncrim- 
oetlog  hla  or  hor  true  signatures  again  all  other 
trua  signatures  of  tha  first  group  of  *00.  (This  tech¬ 
nique  of  feature  selection  could  be  out  one tod  la  a 
practical  signature  verification  spates.)  Tha  actual 
error  rate  calculations  then  aero  performed  using  tha 
second  group  of  *00  true  algnaturea  together  with  all 
of  the  *23  forgeries.  A  taaplata  for  each  subject  waa 
constructed  bp  averaging  together  all  hla  or  hor  trua 
signatures  In  the  second  group  of  trua  data.  Tha  cal¬ 
culated  error  rates,  using  the  decision -eating  alge- 
rlthn  of  Equation  7,  are  shown  la  figures  2,  3,  sad  *. 

To  calculate  tha  Type  1  error  rata,  a  value  of  tha 
distance  aetrlc  oust  be  cooputod  for  each  trua  signa¬ 
ture.  However,  If  tha  distance  aatrlc  for  a  particular 
true  signature  la  coaputod  ualng  a  taaplata  that  In¬ 
cludes  that  particular  signature,  tha  resultant  error 
rat*  will  be  overly  optlaletlc.  for  thla  reason,  we 
subtracted  each  trua  signature  froa  tha  taaplata  whan 
It*  distance  aatrlc  waa  coaputod. 

Toward  the  end  of  the  data  baaa,  wa  observed  that 
tha  error  rates  evened  to  Increase.  Ha  bellavs  thla 
was  because  a on*  aubjecta  eventually  lost  Interest  and 
baeaaa  careless ,  owing  to  the  lack  of  feedback  and 
aotlvatloa  In  the  enperlaautal  daslpi.  This  perhaps 
can  be  alnlalaad  by  better  asparlaaatal  design  and  nay 
well  not  be  an  lnportaat  factor  In  aa  operational 
ayaton.  For  this  reason,  wa  considered  It  of  Interest 
to  par fora  again  the  error  rate  analysis,  but  thla 
tins  excluding  sons  of  the  latar  data-taklag  sessions. 
Thus,  error-rat*  calculations  that  usa  only  tha  first 
*0  data  files,  out  of  a  total  of  S>,  are  presented  In 
figure  5. 

finally,  we  analysed  tha  first  *0  data  files  ualng 
pressure-  and  tlnlng-ralated  features  only,  to  test  how 
auch  laprovenant  can  be  expected  froa  a  spates  that 
utilise*  a  3-axls  pan  over  a  alngla-axls  (l.e. ,  prsssuro- 
onlp)  pen.  These  reaulta  ar*  shown  In  figure  *. 

All  Data  files 

figures  2(e)  and  2(b)  ar*  eonputot  printouts  In  In- 
cransnts  of  0.1  In  RM  difference  (wo  call  tho  calcu¬ 
lated  velue  of  the  distent*  aetrlc,  Xquatlon  7,  tha 
HM3  difference)  for  the  trua  signatures  and  forgarlas 
of  each  subject,  figure  2(a)  susaarlaas  the  true¬ 
st  gnat  ure  data  and  figure  2(b)  displays  the  forgery 
data,  fron  figure  2(a),  we  sea  that  the  first  subject 
(JAB)  entered  2B  true  signatures  (the  sun  of  coluw  1), 
ranging  in  value  of  Utt  difference  fron  0.1  to  1.7. 

The  second  subject  (JLC)  entered  2)  true  signatures 
(the  sun  of  column  2),  ranging  fron  0.3  to  1.6.  fron 
figure  2(b),  we  see  that  there  were  30  atteapted  forg¬ 
eries  of  subject  JAB,  the  closest  having  a  value  of  2.(1 
There  also  were  30  attenptsd  forgeries  of  subject  JLC, 
the  closest  hsvlng  s  value  st  *.*. 

figure  3  is  a  distribution  plot  st  the  true  and 
forgery  values  across  all  subjects.  He  see,  tor  ea- 
aaple,  that  1*  of  the  303  true  signatures  had  US 


dlf foresee  valuoc  greater  than  1.6,  sad  S  of  th*  *23 
forgeries  had  Mi  difference  value*  loss  than  l.S. 
Alternatively .  379  (or  96.*  par cant)  of  tha  trua  algaa- 
turas  had  RM  values  lass  than  1.6,  and  *17  (or  96. 1 
percent)  of  tha  forgarlas  had  ms  values  greater  than 
1.6. 

Tha  overlap  between  the  true  and  forgery  data  Is 
tha  source  of  tha  Type  1  and  Type  II  errors.  The  nagnl- 
tuda  of  each  type  of  error  la  a  function  of  the  value 
of  the  threshold  for  tha  EM  difference  that  la  chosen, 
above  which  a  signature  la  called  false  and  below  idtich 
It  la  called  true.  Magnitude  of  error  aa  a  function  of 
threshold  la  shown  In  figure  *  for  tha  data  of  figure  3. 
Hot*  that  th*  Tjrpa  I  and  Type  II  arrore  have  aa  equal 
value  (1.7  percent)  at  an  HMS  threshold  level  of  1.73. 

first  *0  Bata  files 

figure  3  shows  the  error  result*  when  only  tha  first 
*0  data  files  ar*  considered.  Actually,  the  equal 
Type  I /Type  II  error  rate  Is  th*  sans  (1.7  percent), 
although  th*  Type  II  error  rat*  falls  sore  rapidly  with 
lower  threshold  value*.  Thus,  at  a  threshold  RMS  level 
of  1.8,  there  la  a  0.7  percent  Type  II  error  rat*  (L*. , 
forgery  acceptance)  and  2.3  percent  Type  I  error  rat* 
(l.a.,  trua  signature  rejection). 

Pressure  and  Tlnlnx  fares* tars  Only 

figure  6  shown  the  error  rate  plots  when  the  *0data 
files  of  th*  previous  section  are  rerun  with  all  of  th* 
X-related  and  T-related  paranetera  deleted.  The  equal 
error  rata  la  3  percent. 

VI.  DISCUSSIOM 

These  results  nuat  be  treated  as  strictly  prellnlnary. 

Ho  one  Involved  with  tho  development  of  the  pen  or 
signature  verification  ayaten  waa  also  Involved  In  any 
of  the  data-taklng  seas Iona.  In  thla  way,  we  hoped  to 
eliminate  any  biasing  of  results  that  night  have  been 
caused,  for  exa^la,  by  subconscious  coaching  of  th* 
subjects  by  those  Mio  knew  th*  ayaten  bast.  However, 
wa  would  do  sona  things  differently  In  developing  an¬ 
other  data  baaa.  first,  wa  would  choose  a  different 
location  for  the  teat  (a  number  of  subjects  complained 
after  tho  data  baaa  waa  coapletad  that  th*  coaputsr 
room  In  Milch  th*  data  was  taken  was  very  cold  and 
their  hands  felt  stiff).  Second ,  we  would  shorten  th* 
tins  over  which  th*  data  la  collected  or  try  to  Increase 
th*  notlvatlon  of  tha  subjects.  Ha  found  that  th*  true- 
slgnar  templates  tended  to  develop  larger  standard  de¬ 
viations  toward  th*  and  of  th*  data  baaa  collection 
period,  probably  bocauao  of  th*  lack  of  notlvatlon  and 
resultant  loss  of  lntoroot  noted  earlier.  Because  of 
tha  greater  cenplat*  variances,  the  forgery-acceptance 
rat*  increased,  for  thla  reason,  we  oapectod  signifi¬ 
cantly  bottar  result*  fron  th*  first  portion  of  the  data 
baaa,  although  tha  curves  of  figure  3  do  not  show  as 
auch  inprovanant  as  night  have  been  predicted. 

Conservative  detects  of  the  Results 

He  believe  that  thaoo  results  arc  conservative  In 
thro*  major  ways,  first,  th*  data  was  taken  with  an 
aarly,  aoal-productlen  nodal  of  pan,  which,  unfortentte- 
iy,  had  a  round  body.  Subsequently,  w*  have  obtained 
Improved  results  with  a  triangular-shaped  pen  of  tha 
typo  shown  In  figure  1.  The  X  and  T  signals  fron  th* 
pan  ar*  sensitive  to  th*  angle  of  "roll”  about  th*  ns  In 
axis  of  tha  pan.  Mlth  a  triangular  body,  th*  subject 
grasps  tha  pan  auch  nor*  cans latently  every  tins.  Of 
aaursa,  any  parameters  that  era  eeaeittv*  to  tell  oould 
be  a 1 lain* tad  fraa  tha  analysis,  hut  Including  than 
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results  In  t  grant  improvement  In  parfornaoca  (although 
conalatant  roll  angla  ahould  not  have  such  affact  on 
praaaura  and  tlalng  paraaatara) .  This  raault  wee  first 
notlcad  In  a  foraal  way  whan  taking  a  aaall  data  baaa 
of  Chlnaaa  algnaturaa  uaing  natlva-born  Chlnaaa.  Chi- 
naaa  characteristically  hold  a  pan  looaaly,  aoMwhat 
Ilka  an  artist's  paint  brush.  Parforaanca  with  a 
round-bodlcd  pan— Initially  poor— bacaaa  coaparabla  to 
tha  rasults  shown  hara  whan  a  triangular-shaped  pan 
waa  subatltutad  whlla  tha  group  of  subjaets  raaalnad 
tha  saaa. 

Sacond,  for  tha  purpoaa  of  analysing  thla  data 
basa,  an  Individual's  taaplats  was  nada  by  avaraglng 
trua  slgnaturas  taken  over  a  period  of  three  non tha. 

A  wore  likely  procedure,  at  least  In  SOM  operating 
systens,  would  be  to  uae  tha  first  half-dosen  or  so 
signatures  for  each  subject  as  his  or  her  taaplats, 
and  update  periodically  by  averaging  la  slgnaturas 
that  are  verified  with  an  RMS  diffaraaca  value  lass 
than,  say,  1.0  or  1.2.  In  this  way,  tha  taaplats  auto¬ 
matically  would  crack  any  slow  changes  In  the  subject's 
signature,  and  tha  subject's  standard  deviation  values 
would  tend  to  be  smaller,  asking  hia  signature  wore 
difficult  to  forge.  However,  wa  were  not  able  to  re¬ 
analyze  tha  data  on  that  basis  for  this  paper. 

Finally,  while  the  F-ratlo  technique  for  feature 
selection  Is  simple  and  efficient.  It  Is  by  no  means 
optiMl.  Its  primary  disadvantage  la  that  It  evalu¬ 
ates  eeparataly  tha  discrimination  power  of  each  fea¬ 
ture,  Ignoring  tha  effects  of  intarfaature  correlstlons . 
Also,  the  F-ratlo  has  no  dafinlte  relation  to  tha  prob¬ 
ability  of  error,  except  whan  tha  distribution  of 
values  for  a  faaeura  for  the  true  signature  and  forg¬ 
eries  are  both  Gaussian  and  distributed'  with  equal 
variances.  Therefore,  we  believe  that  the  process  of 
feature  selection  can  be  Improved,  and  that  tha  error 
rates  probably  can  be  reduced.  He  have  begun  to  ex¬ 
plore  other  Mthods  for  selecting  seta' of  features, 
which  will  be  reported  at  a  later  dm. 

Other  Forms  of  "Slsnature" 

Our  discussion  has  enphaslxed  actual  signatures, 
but  the  system  works  well  also  with  symbols  of  any 
fora,  such  as  the  user's  initials,  Che  digits  from 0-9. 
or  one's  telephone  number.  .  No  foraal  dots,  however, 
has  thus  far  been  taken  with  other  than  normal  signa¬ 
tures  . 

Other  Forms  of  Devices 

He  have  described  a  three-axis  psn  as  an  Input  de¬ 
vice  to  a  signature  verification  system.  A  three-axis 
platen  svstea  also  has  been  developed  at  SRI  for  this 
purpose.1  Hlth  this  device,  the  user  can  write  with 
an  ordinary  pen  or  pencil.  Such  a  system  might  hava 
significant  advantages  In  certain  applications,  al¬ 
though  Informal  data  show  chat  It  may  yield  somewhat 
poorer  performance  than  a  system  that  utilises  a  psn. 

In  fact.  Its  performance  Is  likely  to  stand  lntermsdl- 
ate  between  a  one-axis  (pressure-only)  pen  and  a  three- 
axle  pen.  A  one-axis  pen  Is  completely  Insensitive  to 
X  and  Y  forces.  While  a  tnree-axia  platen  does  gener¬ 
ate  X  and  Y  signals,  the  signals  are  Independent  of  the 
way  the  pen  Is  held  by  the  user.  For  Instance,  a  line 
drawn  from  left  to  right  on  the  platen  will  generate  a 
pure  X  signal,  regardless  of  pen  orientation.  Hlth  the 
chree-axis  pen,  however,  the  coordinate  system  is  at¬ 
tached  to  the  pen,  and  therefore  the  X  and  Y  signals 
are  dependent  on  pen  orientation.  For  Instance,  left- 
handed  and  right-handed  users  typically  hava  a  1R0- 
degree  shift  in  X,Y  orientation.  In  other  words,  a 
three-axis  pen  provides  more  information  with  which  to 
distinguish  writers,  to  fact,  the  choice  of  all  right¬ 


handed  subjects  la  tha  data  base  to  yet  >Mthog  conser¬ 
vative  aspect  of  the  results ,  inasmuch  as  left handed 
and  rlghthaadad  users  generally  are  easily  diet  legal shed. 

Correlation  Methods 

He  have  developed  also  a  correlation  method  for 
signature  verification.  In  thla  method ,  the  P,  X,  and 
Y  time  series  force-signals  of  a  test  slgMture  are 
correlated  MtheMtlcally  against  the  appropriate  F, 

X,  and  Y  template  signals.  If  the  test  signature's 
correlation  is' greater  than  some  preassigned  threshold. 

It  Is  Judged  true,  and.  If  not.  It  la  Judged  a  forgery. 
However,  straight  matheMtlcal  correlation  often  yields 
poor  results  because  of  the  nonal  variations  in 
different  true  signatures.  Even  though  the  test  signa¬ 
ture  and  t opiate  P,  X,  and  Y  signals  My  be  highly 
correlated  by  a  subjective,  visual  comparison,  small 
tlM  shifts  within  the  test  signature  P,  X,  and  Y 
signals  can  cause  laportant  phase  shifts  with  respect 
to  tha  template  P,  X,  and  Y  signals.  To  compensate 
for  this  effect,  we  have  developed  a  number  of  tech¬ 
niques  baaad  on  what  we  call  "rubbery''  correlation. 

In  these  Mthoda ,  an  autOMtlc  two-dimensional  fitting 
procedure  Is  used  to  find  an  optimal  Mtch  between  the 
taaplate  and  teat  algnaturaa,  allowing  time  base  trans¬ 
lation  and  time  warping  (stretch  and  contraction)  of 
the  test  signature  P,  X,  and  Y  signals.  These  pro¬ 
cedures  can  be  applied  independently  to  different 
parts  of  the  signature— for  instance,  applied  to  the 
first  half  of  the  taaplate  and  teat  signals  and  then 
Independently  to  the  second  half  of  the  signals;  or 
the  analysis  can  be  done  In  thirds. 

This  Mthod  requires  approximately  ten  times  as 
much  storage  per  subject  (several  thousand  rather  than 
several  hundred  bits  per  user)  but  has  the  potential 
to  yield  significantly  better  performance  than  the 
paraMtsre  Mthod.  Hlth  correlation,  even  if  a  po¬ 
tential  forger  has  all  of  the  raw  signal  data  available, 
he  would  have  to  be  able  to  translate  the  )-axl a 
visual  Information  Into  appropriate  muscle  rerponses 
with  great  accuracy.  Preliminary  results  show  that  it 
la  very  difficult  for  even  a  determined  individual  to 
learn  to  make  such  a  Mtch.  These  rubbery  correlation 
Mthods  will  be  reported  In  a  future  paper. 

He  believe  that  for  som  applications,  and  depend¬ 
ing  on  the  degree  of  performance  required,  there  My 
be  value  In  using  both  the  qulck-snd-eesy  parameters 
Mthod  and  the  more  sophisticated  correlation  Mthods. 

For  Instance,  the  Mthods  of  analysis  have  a  degree 
of  Independence  so  that  their  simultaneous  application 
should  rssult  In  Improved  true/forgery  discrimination. 

VII.  SlfltiARV 

He  have  described  an  autOMtlc  Signature  verifica¬ 
tion  system.  The  system  uses  a  ballpoint  pan  equipped 
with  an  array  of  strain  gauges  mounted  near  the  ball¬ 
point  tip.  The  gauges  record  the  instantaneous  three- 
dimensional,  or  three-axis,  drag  force  generated  at 
the  tip  during  writing,  and  these  signals  are  utilised 
by  the  verification  system.  In  other  words,  the  systM 
analytes  the  dynamics  of  writing  rather  than  the 
static  image  produced  by  the  pen.  In  feet,  the  system 
Mkes  atteaptlng  to  trace  soMone  else's  signature 
one  of  the  worst  possible  strategies  for  forgery. 

He  have  described  a  Mthod  of  analysis  called 
"paraMtera  Mthod.”  It  Is  computationally  simple 
and  can  be  realised  with  current  microprocessor  tech¬ 
nology.  Templates  for  sach  user  consist  of  appro xl- 
Mtely  200  bits  which  cam  be  stored  In  a  central  data 
fils  or  bo  encoded  on  a  card  carried  by  the  user. 
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In  thin  first  reported  dsta  bass,  vs  have  found 
equal  Type  I/Type  II  error  rates  In  the  range  of  I  to 
2  percent.  We  have  stated  why  we  believe  these  results 
are  conaervatlve.  First,  a  round-bodied  pen  was  used 
In  collecting  the  data;  we  find  Much  better  perform¬ 
ance  with  a  pen  that  has  a  triangular-shaped  body, 
which  tends  to  be  held  more  consistently  each  tine. 
Second,  the  t rue-signature  teaplates  for  the  error 
analysis  were  formed  from  true  signatures  taken  over 
the  entire  collection  period  of  several  months;  more 
practical  template-making  procedures  likely  would 
utilize  only  the  subject's  most  recent  signatures, 
which  generally  lead  to  much  "tighter"  teaplates 
(i.e.,  smaller  standard  deviation  values,  which  are 
more  difficult  to  forge).  Third,  a  straightforward 
F-ratlo  analysis  technique  was  used  for  selecting 
features.  However,  this  Is  not  an  optimal  method. 
Currently,  we  are  exploring  methods  that  we  hope 
will  lead  to  an  automatic  means  of  selecting  optimum 
feature  sets  that  likely  will  be  different  for  each 


Ue  have  noted  also  a  method  of  correlation  analy¬ 
sis.  This  method  requires  about  ten  tines  as  much 
template  storage  per  user,  but  Is  a  more  effective 
method  of  analysis.  Both  methods  nay  be  applied 
simultaneously, 
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3 

0 

1 

1 

1 

1 

0 

0 

0 

0 

1 

8 

4.0  to 

4.1 

1 

0 

0 

0 

1 

1 

0 

0 

1 

1 

0 

1 

1 

0 

0 

7 

4.1  to 

4.2 

0 

0 

0 

2 

2 

■  0 

1 

0 

0 

4 

0 

1 

2 

0 

0 

12 

4.2  to 

4.3 

0 

0 

0 

1 

2 

0 

0 

1 

0 

0 

0 

0 

0 

3 

0 

7 

4.3  to 

4.4 

1 

0 

0 

0 

0 

0 

0 

2 

0 

0 

0 

1 

0 

1 

0 

5 

4.4  to 

4.5 

0 

2 

0 

1 

0 

1 

0 

1 

0 

0 

2 

1 

1 

1 

0 

10 

4.5  to 

4.6 

1 

0 

1 

2 

0 

1 

1 

0 

2 

1 

3 

0 

0 

1 

0 

13 

4.6  to 

4.7 

0 

0 

0 

0 

1 

0 

1 

1 

0 

1 

1 

0 

0 

1 

1 

7 

4.7  to 

4.8 

0 

0 

1 

0 

0 

1 

0 

0 

0 

1 

1 

0 

0 

0 

0 

4 

4.8  to 

4.9 

2 

0 

0 

0 

1 

0 

0 

2 

2 

1 

0 

l 

0 

0 

0 

9 

4.9  to 

5.0 

11 

28 

IS 

14 

7 

2 

18 

6 

22 

2 

3 

12 

a 

3 

16 

181 

• 

«• 

Total 

30 

30 

31 

30 

30 

27 

30 

30 

30 

30 

29 

26 

18 

28 

>6 

425 

Fifura  3(b)  Coaputar  prlne-ouca  (or  forgot?  data. 
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Appendix  D 

MAXIMUM  LIKELIHOOD  ESTIMATION 


'■  tA... 


MAXIMUM  LIKELIHOOD  ESTIMATION 


In  the  main  text  of  this  report ,  a  binomial  distribution  was  used  to 
describe  the  probability  of  R  false  rejects  in  T  trials.  For  a  single  trial 
the  equivalent  relation  is  (for  the  ith  trial) 


zi 

Prob  jzj  *  p  (1-P) 


where  P  is  the  true  population  error  rate  to  be  estimated.  In  the  above  rela¬ 
tion  -  1  if  the  ith  trial  is  a  false  rejection,  and  *  0  if  the  jlth  trial 

is  a  correct  verification.  Thus,  for  example,  the  probability  that  the  ith 
trial  is  a  false  rejection  is  Prob  jzj  *  l|  =  P,  and  the  probability  that  it 
is  a  correct  verification  is  Prob  |zi  =  o}  =  1  -  P. 

The  goal  is  to  derive  an  estimate  for  P  that  can  be  calculated  using  a 
verification  data  base  and  that  in  some  sense  best  agrees  with  the  actually 
observed  data.  The  maximum  likelihood  approach  yields  one  such  estimate.* 

The  first  step  in  the  procedure  is  to  form  the  likelihood  function  L(P). 
Assuming  independent  trials,  the  joint  probability  distribution  for  T  trials 
is 

T  ! 

Zk  1-Zfc 

Prob  {2^2 . zxl  *  n  p  0--p> 

k-1 

The  likelihood  function  is  defined  as  the  logarithm  of  Prob  {z^.z^,  . ...,z^|: 

L(P)  -  logfProbjzj.Zj ,  ••••»  zT}l 

.  £  iog  (/Ni-p/'i 

k-1 

t  r  t  1 

-  Zk)  log  P  +1  ^  (l-*k>|  log(l-P) 

k-1  [k-1  J 

*The  theoretical  foundation  of  maximum  likelihood  estimation  is  too  involved 
to  treat  here.  For  more  details,  see,  for  example,  H.  Cramer,  Mathematical 
Methods  of  Statistics  (Princeton  University  Press,  1951). 
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L(P)  is  maximized  in  the  usual  way  by  setting  its  derivative  with  respect  to 
P  equal  to  0.  This  yields 


?£ ‘k-rbf-i: *k)-°  • 

k-1  \  k-1  / 

which  implies  that 

T 

'■I'k'1  • 

k-1 

From  the  definition  of  ,  we  know  that 

T 

2> 

k-1 

is  simply  the  total  number  R  of  false  rejects  in  T  trials,  so  the  maximum 
likelihood  estimate  of  the  error  rate  is 


CONFIDENCE  LIMITS 


In  the  main  text  of  this  report,  the  probability  of  R  false  rejections 
in  T  trials  was  expressed  by  the  binomial  distribution 

\ 

Prob  |r|  =  C^  PR  (1-P)T-R 


where  P  is  the  true  error  rate.  An  extimate  of  P  that  can  be  calculated  from 
a  data  base  of  verification  trials  is 


maximum  likelihood  estimate  of  P  =  — 


What  is  our  confidence  that  the  estimate  P  is  a  good  approximation  to 
the  true  error  rate  P?  For  simplicity,  we  assume  that  T  is  large*  so  that 
the  binomial  distribution  can  be  approximated  by  a  Gaussian  distribution  of 
variance  P(l-P)/T.  It  can  be  shown  (Snedecor  and  Cochran,  1967)  that  the 
probability  that  P  lies  between 

f  -  1.96/p (1-P)/T  and  P  +  1. 96/p(1-&) /T 


is  approximately  95  percent.  In  other  words,  if  we  calculate  P  for  a  partic¬ 
ular  data  base,  we  can  be  95  percent  certain  that  the  true  error  P  rate  lies 
between  the  above  limits.  The  two  limits  above  are  sometimes  called  the  95 
percent  confidence  limits.  The  99  percent  confidence  limits  can  be  calculated 
simply  by  substituting  2.576  for  1.96. 

Example — Suppose  that  200  false  rejects  occur  in  1,000  trials: 


P 


200 

1000 


0.2 


*For  small  T  see  Figure  E-l. 

Snedecor,  G.  W. ,  and  W.  G.  Cochran,  Statistical  Methods  (Iowa  State  University 
Press,  1967),  pp.  210-211. 


We  can  be  99  percent  certain  that  the  true  error  rate  is  in  the  range 


0.2  ±  2.576  /(. 2) (. 8) /1000  -  0.2  ±  0.033 


That  is,  we  are  99 


percent  certain  that  16.7 


percent  5  P  £  23.3  percent 


ESTIMATED  TYPE  I  ERROR  ?  -  R'T 
FIGURE  E-1  CONFIDENCE  LIMITS 
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